viirya opened a new issue, #973:
URL: https://github.com/apache/datafusion-comet/issues/973

   ### Describe the bug
   
   Integrating Comet with Iceberg internally gets the following error if there 
are deleted rows in the Iceberg table:
   
   ```
   org.apache.comet.CometNativeException: Invalid argument error: all columns 
in a record batch must have the specified row count
   ```
   
   It is because Iceberg stores row mappings in its `CometVector` 
implementations and uses it to skip deleted rows during iterating rows in a 
batch. The row values in arrays are not actually deleted. The Iceberg batch 
reader sets the number of rows of a returned record batch to be the "logical" 
number of rows after deletion. It is okay for Java Arrow.
   
   However, arrow-rs has stricter check on the lengths of arrays and row number 
parameter. Once it detects they are inconsistent, an error like above will be 
thrown.
   
   ### Steps to reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to