szehon-ho commented on code in PR #4898:
URL: https://github.com/apache/iceberg/pull/4898#discussion_r911461326
##########
core/src/test/java/org/apache/iceberg/TestManifestReader.java:
##########
@@ -142,9 +147,61 @@ public void testDeleteFilePositions() throws IOException {
long expectedPos = 0L;
for (DeleteFile file : reader) {
Assert.assertEquals("Position should match", (Long) expectedPos,
file.pos());
- Assert.assertEquals("Position from field index should match",
expectedPos, ((BaseFile) file).get(17));
+ Assert.assertEquals("Position from field index should match",
expectedPos, ((BaseFile) file).get(18));
expectedPos += 1;
}
}
}
+
+ @Test
+ public void testManifestCompatible() throws IOException {
+ Schema schema = new Schema(
+ required(1, "id", Types.LongType.get()),
+ required(2, "timestamp", Types.TimestampType.withZone()),
+ required(3, "category", Types.StringType.get()),
+ required(4, "data", Types.StringType.get()),
+ required(5, "double", Types.DoubleType.get()));
+
+ PartitionSpec spec = PartitionSpec.builderFor(schema)
+ .identity("category")
+ .hour("timestamp")
+ .bucket("id", 16)
+ .build();
+ PartitionData partition = DataFiles.data(spec,
"category=cheesy/timestamp_hour=10/id_bucket=3");
+
+ String path =
"s3://bucket/table/category=cheesy/timestamp_hour=10/id_bucket=3/file.avro";
+ long fileSize = 150972L;
+ FileFormat format = FileFormat.AVRO;
+
+ Metrics metrics = new Metrics(
+ 1587L,
+ ImmutableMap.of(1, 15L, 2, 122L, 3, 4021L, 4, 9411L, 5, 15L), // sizes
+ ImmutableMap.of(1, 100L, 2, 100L, 3, 100L, 4, 100L, 5, 100L), //
value counts
+ ImmutableMap.of(1, 0L, 2, 0L, 3, 0L, 4, 0L, 5, 0L), // null value
counts
+ ImmutableMap.of(5, 10L), // nan value counts
+ ImmutableMap.of(1, Conversions.toByteBuffer(Types.IntegerType.get(),
1)), // lower bounds
+ ImmutableMap.of(1, Conversions.toByteBuffer(Types.IntegerType.get(),
1))); // upper bounds
+ Integer sortOrderId = 2;
+
+ String fileName = String.format("OldManifestFileV%s.avro", formatVersion);
Review Comment:
This is really great to have this test. I was initially thinking that we
want to have a TestV1Writer / TestV2Writer that writes all the optional fields
as null, instead of checking in the old version avro file. Is that possible?
I can also take a look myself if that is possible.
##########
core/src/test/java/org/apache/iceberg/TestManifestReader.java:
##########
@@ -142,9 +147,61 @@ public void testDeleteFilePositions() throws IOException {
long expectedPos = 0L;
for (DeleteFile file : reader) {
Assert.assertEquals("Position should match", (Long) expectedPos,
file.pos());
- Assert.assertEquals("Position from field index should match",
expectedPos, ((BaseFile) file).get(17));
+ Assert.assertEquals("Position from field index should match",
expectedPos, ((BaseFile) file).get(18));
expectedPos += 1;
}
}
}
+
+ @Test
+ public void testManifestCompatible() throws IOException {
+ Schema schema = new Schema(
+ required(1, "id", Types.LongType.get()),
+ required(2, "timestamp", Types.TimestampType.withZone()),
+ required(3, "category", Types.StringType.get()),
+ required(4, "data", Types.StringType.get()),
+ required(5, "double", Types.DoubleType.get()));
+
+ PartitionSpec spec = PartitionSpec.builderFor(schema)
+ .identity("category")
+ .hour("timestamp")
+ .bucket("id", 16)
+ .build();
+ PartitionData partition = DataFiles.data(spec,
"category=cheesy/timestamp_hour=10/id_bucket=3");
+
+ String path =
"s3://bucket/table/category=cheesy/timestamp_hour=10/id_bucket=3/file.avro";
Review Comment:
Is this used?
##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -54,6 +54,7 @@ public PartitionData copy() {
private Types.StructType partitionType;
private Long fileOrdinal = null;
+ private int schemaId = -1;
Review Comment:
In spec its optional, so trying to understand, why don't we use Integer?
##########
core/src/main/java/org/apache/iceberg/deletes/EqualityDeleteWriter.java:
##########
@@ -42,18 +42,26 @@
private final ByteBuffer keyMetadata;
private final int[] equalityFieldIds;
private final SortOrder sortOrder;
+ private final int schemaId;
private DeleteFile deleteFile = null;
public EqualityDeleteWriter(FileAppender<T> appender, FileFormat format,
String location,
PartitionSpec spec, StructLike partition,
EncryptionKeyMetadata keyMetadata,
SortOrder sortOrder, int... equalityFieldIds) {
+ this(appender, format, location, spec, partition, keyMetadata, sortOrder,
-1, equalityFieldIds);
Review Comment:
It's not that great to have to add a new constructor to all these. As we
already have WriterFactory that abstract it and should be the ones getting
called, I wonder if we can just change this interface? cc @rdblue @aokolnychyi
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]