pvary commented on code in PR #15633:
URL: https://github.com/apache/iceberg/pull/15633#discussion_r2939066246
##########
data/src/test/java/org/apache/iceberg/data/BaseFormatModelTests.java:
##########
@@ -317,6 +314,305 @@ void
testPositionDeleteWriterEngineWriteGenericRead(FileFormat fileFormat) throw
DataTestHelpers.assertEquals(positionDeleteSchema.asStruct(), records,
readRecords);
}
+ @ParameterizedTest
+ @FieldSource("FORMAT_AND_GENERATOR")
+ /** Write with Generic Record, read with projected engine type T (narrow
schema) */
+ void testReaderBuilderProjection(FileFormat fileFormat, DataGenerator
dataGenerator)
+ throws IOException {
+ Schema fullSchema = dataGenerator.schema();
+
+ List<Types.NestedField> columns = fullSchema.columns();
+ Schema projectedSchema = new Schema(columns.get(columns.size() - 1));
+
+ List<Record> genericRecords = dataGenerator.generateRecords();
+ writeGenericRecords(fileFormat, fullSchema, genericRecords);
+
+ List<Record> projectedGenericRecords = projectRecords(genericRecords,
projectedSchema);
+ List<T> expectedEngineRecords =
+ convertToEngineRecords(projectedGenericRecords, projectedSchema);
+
+ InputFile inputFile = encryptedFile.encryptingOutputFile().toInputFile();
+ List<T> readRecords;
+ try (CloseableIterable<T> reader =
+ FormatModelRegistry.readBuilder(fileFormat, engineType(), inputFile)
+ .project(projectedSchema)
+ .engineProjection(engineSchema(projectedSchema))
+ .build()) {
+ readRecords = ImmutableList.copyOf(reader);
+ }
+
+ assertEquals(projectedSchema, expectedEngineRecords, readRecords);
+ }
+
+ @ParameterizedTest
+ @FieldSource("FORMAT_AND_GENERATOR")
+ void testReaderBuilderFilter(FileFormat fileFormat, DataGenerator
dataGenerator)
+ throws IOException {
+
+ // Avro does not support filter push down
+ // Skip this test for Avro to avoid false failures.
+ assumeThat(fileFormat != FileFormat.AVRO).isTrue();
Review Comment:
I think this would worth a letter to the dev list about how to handle the
missing features. We should collect as many examples as we can, and ask the
community what to do about it.
I don't like hiding them here.
1. Maybe the FileFormat enum can contain some methods like `supportsFilter`,
`supportsCaseSensitive`?
2. Maybe we can have a matrix in the beginning of the test case like
`MISSING_FEATURES = ImmutableMap.of(FileFormat.AVRO, String[] {"filter",
"caseSensitive"});`
3. Maybe a separate test class for every format instead of storing this
directly in the FileFormat enum?
I prefer 1 or 2 but open for other suggestions
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]