flyrain opened a new issue #3141:
URL: https://github.com/apache/iceberg/issues/3141


   Vectorized reader does NOT support row-level delete currently. It is turned 
off, check this code, 
https://github.com/apache/iceberg/blob/80ff749b823098db82d0a8dc48c7e9db5ab3741b/spark3/src/main/java/org/apache/iceberg/spark/source/SparkBatchScan.java#L180
   
   I'm working on a solution to enable vectorized reading for row-level delete. 
The idea is to filter out deleted rows when Iceberg return a batch for Spark to 
consume. The challenge is that class `ColumnarBatch` is from Spark, and is a 
final class. We cannot extend it in Iceberg. Of course, we can filter out 
deleted rows by iterating it, and construct a new batch object, but that would 
have a big perf concern. I will trying to propose the idea to make 
`ColumnarBatch` non final. Hopefully it can be accepted. Otherwise, we need to 
think about other ways to approach this feature. 
   
   Any feedback?
   
   cc @aokolnychyi @rdblue @RussellSpitzer @jackye1995 @sunchao @chenjunjiedada 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to