huan233usc commented on code in PR #16292:
URL: https://github.com/apache/iceberg/pull/16292#discussion_r3338865939


##########
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java:
##########
@@ -958,4 +958,50 @@ public String toString() {
     @Override
     public void setBatchSize(int batchSize) {}
   }
+
+  public static class VectorizedVariantReader extends VectorizedArrowReader {
+    private final VectorizedArrowReader metadataReader;
+    private final VectorizedArrowReader valueReader;
+
+    public VectorizedVariantReader(
+        Types.NestedField icebergField,
+        VectorizedArrowReader metadataReader,
+        VectorizedArrowReader valueReader) {
+      super(icebergField);
+      this.metadataReader = metadataReader;
+      this.valueReader = valueReader;
+    }
+
+    @Override
+    public VectorHolder read(VectorHolder reuse, int numValsToRead) {
+      VectorHolder metadataHolder = metadataReader.read(null, numValsToRead);

Review Comment:
   Should we reuse the VectorHolder here to avoid re-allocation for each batch?
   
   Since the child vectors are passed through into VariantVectorHolder (not 
closed), they round-trip back as reuse, so passing null makes both leaf readers 
reallocate every batch.



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatch.java:
##########
@@ -172,6 +173,11 @@ private boolean supportsParquetBatchReads(ScanTask task) {
   }
 
   private boolean supportsParquetBatchReads(Types.NestedField field) {
+    if (field.type().isVariantType()) {
+      String shredEnabled = 
table.properties().get(TableProperties.PARQUET_SHRED_VARIANTS);
+      return !"true".equalsIgnoreCase(shredEnabled);

Review Comment:
   ```suggestion
         return !PropertyUtil.propertyAsBoolean(table.properties(),
           TableProperties.PARQUET_SHRED_VARIANTS,
           TableProperties.PARQUET_SHRED_VARIANTS_DEFAULT);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to