[GitHub] [iceberg] rdblue commented on pull request #2651: Core: add checkShouldKeep for DeleteFilter

GitBox Mon, 14 Jun 2021 15:41:25 -0700


rdblue commented on pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#issuecomment-861040175



   @jackye1995, my primary concern with this is that you're reading the data to 
produce whether or not a row should be present. I think that would mean 
basically reading all of the data twice in Trino: once into pages and once for 
the `CloseableIterable<Boolean>` used to mark whether a row in a page should be 
kept. That doesn't seem worth it to me, although you could argue that if your 
projection is narrow it would be okay. You may also already be re-using the 
pages to produce the underlying `StructLike` iterable. If that's the case, then 
you can ignore from here on...
   
   I think a better approach would be to adapt the pages in Trino to produce a 
view of a row that implements `StructLike`, at least in a limited capacity 
(i.e. for top-level or struct columns only, no array/map nesting). That's what 
happens in Spark vectorization. There is a class that wraps a batch and 
provides access to a row using its ordinal. If you had that, you'd be able to 
read the data into pages and still use Iceberg's delete filters to test each 
row.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on pull request #2651: Core: add checkShouldKeep for DeleteFilter

Reply via email to