[I] Java Arrow stream reader cannot read struct vector with duplicate field name [datafusion-comet]

via GitHub Sun, 04 Aug 2024 15:47:37 -0700


viirya opened a new issue, #777:
URL: https://github.com/apache/datafusion-comet/issues/777


   ### Describe the bug
   
   Found this bug when fixing Spark SQL test failures for #651.
   
   We use Java Arrow stream reader to reads Arrow-format shuffle data. But if 
there is struct vector with duplicate field name, Java Arrow will throw the 
following error:
   
   ```
   [info]   Cause: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 1 in stage 311.0 failed 1 times, most recent failure: Lost task 
1.0 in stage 311.0 (TID 882) (192.168.86.44 executor driver): java.lang.Illegal
   ArgumentException: not all nodes and buffers were consumed. nodes: 
[ArrowFieldNode [length=4, nullCount=0]] buffers: [ArrowBuf[9855], 
address:4929620864, capacity:28, ArrowBuf[9857], address:4929620928, 
capacity:1, ArrowBuf[9859], address:4929620992, capacity:32]                    
                                                                                
                                                                                
                   [info]  at 
org.apache.comet.shaded.arrow.vector.VectorLoader.load(VectorLoader.java:89)    
                                                                                
                                                            
   [info]  at 
org.apache.comet.shaded.arrow.vector.ipc.ArrowReader.loadRecordBatch(ArrowReader.java:220)
                                                                                
                                                  
   [info]  at 
org.apache.comet.shaded.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:161)
                                                                                
                                        
   [info]  at 
org.apache.comet.vector.StreamReader.nextBatch(StreamReader.scala:41)           
                                                                                
                                                            
   ```
   
   ### Steps to reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Java Arrow stream reader cannot read struct vector with duplicate field name [datafusion-comet]

Reply via email to