[GitHub] [iceberg] flyrain commented on a diff in pull request #4888: Core: Support _deleted metadata column in vectorized read

GitBox Fri, 27 May 2022 17:27:30 -0700


flyrain commented on code in PR #4888:
URL: https://github.com/apache/iceberg/pull/4888#discussion_r884039841



##########
spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java:
##########
@@ -70,8 +70,7 @@ public class TestSparkParquetReadMetadataColumns {
   private static final Schema PROJECTION_SCHEMA = new Schema(
       required(100, "id", Types.LongType.get()),
       required(101, "data", Types.StringType.get()),
-      MetadataColumns.ROW_POSITION,
-      MetadataColumns.IS_DELETED
+      MetadataColumns.ROW_POSITION

Review Comment:
   We need this change since VectorizedReaderBuilder is shared by all spark 
versions. The change in line 94 of VectorizedReaderBuilder changes the type of 
the reader as the following code shows. Then, the read throws exception in the 
method `IcebergArrowColumnVector.forHolder()` of the old Spark version. This 
change should be fine, since the old Spark doesn't really support _deleted 
metadata column.
   ```
           reorderedFields.add(new VectorizedArrowReader.DeletedVectorReader());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] flyrain commented on a diff in pull request #4888: Core: Support _deleted metadata column in vectorized read

Reply via email to