yyanyy commented on a change in pull request #1641:
URL: https://github.com/apache/iceberg/pull/1641#discussion_r513832352
##########
File path:
spark/src/main/java/org/apache/iceberg/spark/source/StructInternalRow.java
##########
@@ -146,8 +146,13 @@ public UTF8String getUTF8String(int ordinal) {
Review comment:
The change to this class was mostly trying to use it in
`TestSparkParquetMergingMetrics`. Currently this class is only being used for
reading rows for metadata tables (in
[RowDataReader](https://github.com/apache/iceberg/blob/master/spark/src/main/java/org/apache/iceberg/spark/source/RowDataReader.java#L189),
and I think only the metadata table will produce `DataTask`).
In the tests I wanted to convert `Record` to `InternalRow` for testing Spark
appender. I was debating if I should expand this class beyond its current
usage, or to write a new converter. Here are the two things I need to change to
(partially) implement the former:
1) null handling which results to the change in `get()`: `RowDataReader`
doesn't call `get()` directly (it uses Dyn reflections for reading individual
attributes and skip nulls) for converting into other Spark internal row
representation (`UnsafeRow` in this case), thus we didn't see issue. However,
when spark parquet writer uses `get()`, without this change NPE will occur.
Note that even after this change, other use cases of this class (e.g.
`getUTF8String()` are still not null safe, and I wonder if people have opinion
on if we want a full null-safe support update to all methods in this class.
2) for `getBinary()` change, currently we convert `Fixed` type to binary for
Spark
([link](https://github.com/apache/iceberg/blob/master/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java#L118-L119)),
and the method I used for creating random records generate fixed type with
`byte[]`
([link](https://github.com/apache/iceberg/blob/master/api/src/test/java/org/apache/iceberg/util/RandomUtil.java#L130-L133)),
and before this change `getBinary()` doesn't work for byte[] implementation of
fixed type. Alternatively we could wrap fixed type in random record generator
the [same way we do for binary
type](https://github.com/apache/iceberg/blob/master/data/src/test/java/org/apache/iceberg/data/RandomGenericData.java#L229-L230).
I decide to do the former to allow binary related types to be more flexible
when wrapping them in this class, but I guess this comes back to the question
to if we want to evolve this class or to create a separate converter.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]