viirya opened a new issue, #777:
URL: https://github.com/apache/datafusion-comet/issues/777
### Describe the bug
Found this bug when fixing Spark SQL test failures for #651.
We use Java Arrow stream reader to reads Arrow-format shuffle data. But if
there is struct vector with duplicate field name, Java Arrow will throw the
following error:
```
[info] Cause: org.apache.spark.SparkException: Job aborted due to stage
failure: Task 1 in stage 311.0 failed 1 times, most recent failure: Lost task
1.0 in stage 311.0 (TID 882) (192.168.86.44 executor driver): java.lang.Illegal
ArgumentException: not all nodes and buffers were consumed. nodes:
[ArrowFieldNode [length=4, nullCount=0]] buffers: [ArrowBuf[9855],
address:4929620864, capacity:28, ArrowBuf[9857], address:4929620928,
capacity:1, ArrowBuf[9859], address:4929620992, capacity:32]
[info] at
org.apache.comet.shaded.arrow.vector.VectorLoader.load(VectorLoader.java:89)
[info] at
org.apache.comet.shaded.arrow.vector.ipc.ArrowReader.loadRecordBatch(ArrowReader.java:220)
[info] at
org.apache.comet.shaded.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:161)
[info] at
org.apache.comet.vector.StreamReader.nextBatch(StreamReader.scala:41)
```
### Steps to reproduce
_No response_
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]