zhangminglei commented on a change in pull request #3186:
URL: https://github.com/apache/iceberg/pull/3186#discussion_r749096922
##########
File path:
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ConstantColumnVector.java
##########
@@ -119,6 +122,8 @@ public UTF8String getUTF8String(int rowId) {
@Override
public ColumnVector getChild(int ordinal) {
- throw new UnsupportedOperationException("ConstantColumnVector only
supports primitives");
+ DataType sparkType = ((StructType) type).fields()[ordinal].dataType();
+ Type childType = SparkSchemaUtil.convert(sparkType);
+ return new ConstantColumnVector(childType, batchSize, ((InternalRow)
constant).get(ordinal, sparkType));
Review comment:
> There isn't anything specific to ORC here,so I'm not convinced that
the ORC readers handle nested constant values correctly
Ans: The change itself is not specific to ORC here, but ORC is currently
used that way, and Parquet's vectorization read do not use this, and this is a
general practice, not to ORC itself.
> I'm not sure that this is the right approach.
Ans: When doing the vectorization read for orc,
```VectorizedSparkOrcReaders#StructConverter```use ```idToConstant``` and will
convert constant values to ```ConstantColumnVector```,
but```ConstantColumnVector``` cannot represent a Struct Type currently. So I
need a struct constant.
According to Spark's API, a ```ColumnVector``` that represents a struct must
implement the ```getChild``` method.
https://github.com/apache/spark/blob/v3.2.0/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java#L219
https://github.com/apache/spark/blob/v3.2.0/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java#L302
Since ```getChild``` returns a ```ColumnVector```, and each child in a
struct type is a constant also, so it can return a ```ConstantColumnVector``` I
think.
Besides , I will add more tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]