flyrain commented on issue #3141: URL: https://github.com/apache/iceberg/issues/3141#issuecomment-932599379
There are two ways to approach this: 1. Iterate the `ColumnarBatch` object to filter out deleted rows, and remove or overwrite the deleted rows in correspondent column vectors. Performance is the major concern. We add an additional iteration for each row in the batch, and memory copy may happen. However, we may have to do this anyway for EqualityDelete since there is no way to know if the row is deleted without reading its values. 2. For pos deletes, to add a bit map to filter out deleted rows in Arrow's column vector. Not sure if this is possible though. cc @aokolnychyi @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
