flyrain opened a new issue #3141: URL: https://github.com/apache/iceberg/issues/3141
Vectorized reader does NOT support row-level delete currently. It is turned off, check this code, https://github.com/apache/iceberg/blob/80ff749b823098db82d0a8dc48c7e9db5ab3741b/spark3/src/main/java/org/apache/iceberg/spark/source/SparkBatchScan.java#L180 I'm working on a solution to enable vectorized reading for row-level delete. The idea is to filter out deleted rows when Iceberg return a batch for Spark to consume. The challenge is that class `ColumnarBatch` is from Spark, and is a final class. We cannot extend it in Iceberg. Of course, we can filter out deleted rows by iterating it, and construct a new batch object, but that would have a big perf concern. I will trying to propose the idea to make `ColumnarBatch` non final. Hopefully it can be accepted. Otherwise, we need to think about other ways to approach this feature. Any feedback? cc @aokolnychyi @rdblue @RussellSpitzer @jackye1995 @sunchao @chenjunjiedada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
