flyrain commented on code in PR #4539:
URL: https://github.com/apache/iceberg/pull/4539#discussion_r856594251
##########
spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java:
##########
@@ -70,8 +70,7 @@ public class TestSparkParquetReadMetadataColumns {
private static final Schema PROJECTION_SCHEMA = new Schema(
required(100, "id", Types.LongType.get()),
required(101, "data", Types.StringType.get()),
- MetadataColumns.ROW_POSITION,
- MetadataColumns.IS_DELETED
Review Comment:
Some tests(e.g., `testReadRowNumbersWithDelete`) failed due to it uses the
`PROJECTION_SCHEMA`, which contains IS_DELETED. So the row count was wrong
since it still assumes only undeleted rows are there, with the new logic, both
delete and undeleted rows are there. We can either remove the column
`IS_DELETED` so that deleted rows won't be in the results. Or we can change the
test case to filter out the deleted rows in the results. I choose the former so
that it can fix multiple failure at once.
##########
spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java:
##########
@@ -70,8 +70,7 @@ public class TestSparkParquetReadMetadataColumns {
private static final Schema PROJECTION_SCHEMA = new Schema(
required(100, "id", Types.LongType.get()),
required(101, "data", Types.StringType.get()),
- MetadataColumns.ROW_POSITION,
- MetadataColumns.IS_DELETED
Review Comment:
Some tests(e.g., `testReadRowNumbersWithDelete`) failed due to it uses the
`PROJECTION_SCHEMA`, which contains IS_DELETED. So the row count was wrong
since it still assumes only undeleted rows are there, with the new logic, both
delete and undeleted rows are there. We can either remove the column
`IS_DELETED` so that deleted rows won't be in the results. Or we can change the
test case to filter out the deleted rows in the results. I choose the former so
that it can fix multiple failures at once.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]