[GitHub] [iceberg] flyrain commented on a diff in pull request #4888: Core: Support _deleted metadata column in vectorized read

GitBox Fri, 27 May 2022 20:36:44 -0700


flyrain commented on code in PR #4888:
URL: https://github.com/apache/iceberg/pull/4888#discussion_r884039841



##########
spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java:
##########
@@ -70,8 +70,7 @@ public class TestSparkParquetReadMetadataColumns {
   private static final Schema PROJECTION_SCHEMA = new Schema(
       required(100, "id", Types.LongType.get()),
       required(101, "data", Types.StringType.get()),
-      MetadataColumns.ROW_POSITION,
-      MetadataColumns.IS_DELETED
+      MetadataColumns.ROW_POSITION

Review Comment:
   We need this change since the class VectorizedReaderBuilder is shared by all 
spark versions. The change in line 94 of VectorizedReaderBuilder changes the 
type of the reader as the following code shows. Then, the read throws exception 
in the method `IcebergArrowColumnVector.forHolder()` of the old Spark version. 
This change should be fine, since the old Spark doesn't really support _deleted 
metadata column.
   ```
           reorderedFields.add(new VectorizedArrowReader.DeletedVectorReader());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] flyrain commented on a diff in pull request #4888: Core: Support _deleted metadata column in vectorized read

Reply via email to