[GitHub] [iceberg] flyrain commented on a diff in pull request #4539: Change Data Capture(CDC)[Draft]

GitBox Fri, 22 Apr 2022 14:53:21 -0700


flyrain commented on code in PR #4539:
URL: https://github.com/apache/iceberg/pull/4539#discussion_r856594251



##########
spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java:
##########
@@ -70,8 +70,7 @@ public class TestSparkParquetReadMetadataColumns {
   private static final Schema PROJECTION_SCHEMA = new Schema(
       required(100, "id", Types.LongType.get()),
       required(101, "data", Types.StringType.get()),
-      MetadataColumns.ROW_POSITION,
-      MetadataColumns.IS_DELETED

Review Comment:
   Some tests(e.g., `testReadRowNumbersWithDelete`) failed due to it uses the 
`PROJECTION_SCHEMA`, which contains the column IS_DELETED. So the row count was 
wrong since it still assumes only undeleted rows are there, with the new logic, 
both delete and undeleted rows are there. 
   To fix it, we can either remove the column `IS_DELETED` so that deleted rows 
won't be in the results, or change the test case to filter out the deleted rows 
in the results. I choose the former so that it can fix multiple failures at 
once.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] flyrain commented on a diff in pull request #4539: Change Data Capture(CDC)[Draft]

Reply via email to