zhangminglei commented on a change in pull request #3186:
URL: https://github.com/apache/iceberg/pull/3186#discussion_r749096922



##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ConstantColumnVector.java
##########
@@ -119,6 +122,8 @@ public UTF8String getUTF8String(int rowId) {
 
   @Override
   public ColumnVector getChild(int ordinal) {
-    throw new UnsupportedOperationException("ConstantColumnVector only 
supports primitives");
+    DataType sparkType = ((StructType) type).fields()[ordinal].dataType();
+    Type childType = SparkSchemaUtil.convert(sparkType);
+    return new ConstantColumnVector(childType, batchSize, ((InternalRow) 
constant).get(ordinal, sparkType));

Review comment:
       > There isn't anything specific to ORC here,so I'm not convinced that 
the ORC readers handle nested constant values correctly
   
   Ans: The change itself is not specific to ORC here, but ORC is currently 
used that way, and Parquet's vectorization read do not use this, and this is a 
general practice, not to ORC itself.
   
   >  I'm not sure that this is the right approach.
   
   Ans:  When doing the vectorization read for orc, 
```VectorizedSparkOrcReaders#StructConverter```use ```idToConstant``` and will 
convert constant values to ```ConstantColumnVector```, 
but```ConstantColumnVector``` cannot represent a Struct Type currently. So I 
need a struct constant.
   
   According to Spark's API, a ```ColumnVector``` that represents a struct must 
implement the ```getChild``` method.
   
https://github.com/apache/spark/blob/v3.2.0/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java#L219
   
https://github.com/apache/spark/blob/v3.2.0/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java#L302
   
   Since ```getChild``` returns a ```ColumnVector```, and each child in a 
struct type is a constant also, so it can return a ```ConstantColumnVector``` I 
think.
   
   Besides , I will add more tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to