lirui-apache opened a new issue #3139:
URL: https://github.com/apache/iceberg/issues/3139
This can be reproduced by modifying test
`TestPartitionValues::testPartitionedByNestedString`, to let it write with ORC
format instead of parquet, e.g. with the following change:
```java
// write into iceberg
sourceDF.write()
.format("iceberg")
.option(WRITE_FORMAT, format) // add this line
.mode(SaveMode.Append)
.save(baseLocation);
```
And the test would fail with:
```shell
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:659)
at java.util.ArrayList.get(ArrayList.java:435)
at
org.apache.iceberg.orc.OrcValueReaders$StructReader.<init>(OrcValueReaders.java:161)
at
org.apache.iceberg.spark.data.SparkOrcValueReaders$StructReader.<init>(SparkOrcValueReaders.java:143)
at
org.apache.iceberg.spark.data.SparkOrcValueReaders.struct(SparkOrcValueReaders.java:70)
at
org.apache.iceberg.spark.data.SparkOrcReader$ReadBuilder.record(SparkOrcReader.java:75)
at
org.apache.iceberg.spark.data.SparkOrcReader$ReadBuilder.record(SparkOrcReader.java:65)
at
org.apache.iceberg.orc.OrcSchemaWithTypeVisitor.visitRecord(OrcSchemaWithTypeVisitor.java:71)
at
org.apache.iceberg.orc.OrcSchemaWithTypeVisitor.visit(OrcSchemaWithTypeVisitor.java:38)
at
org.apache.iceberg.orc.OrcSchemaWithTypeVisitor.visit(OrcSchemaWithTypeVisitor.java:32)
at
org.apache.iceberg.spark.data.SparkOrcReader.<init>(SparkOrcReader.java:52)
at
org.apache.iceberg.spark.source.RowDataReader.lambda$newOrcIterable$2(RowDataReader.java:164)
at org.apache.iceberg.orc.OrcIterable.iterator(OrcIterable.java:108)
at org.apache.iceberg.orc.OrcIterable.iterator(OrcIterable.java:45)
at org.apache.iceberg.util.Filter.lambda$filter$0(Filter.java:35)
at
org.apache.iceberg.io.CloseableIterable$2.iterator(CloseableIterable.java:73)
at
org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:78)
```
In `RowDataReader::newOrcIterable`, we exclude the struct field and create
an `OrcIterable` with an empty schema, because the inner string is a constant
(partition value). And later on we hit the exception when constructing the
readers.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]