[GitHub] [arrow-datafusion] thinkharderdev commented on pull request #5362: Add index interface method

via GitHub Thu, 23 Feb 2023 12:03:40 -0800


thinkharderdev commented on PR #5362:
URL: 
https://github.com/apache/arrow-datafusion/pull/5362#issuecomment-1442362033


   > The current planner will test each individual part of the conjunction and 
the Table will have to scan() all `Gardner`s - even though it would have done a 
point lookup on `Brent Gardner` (assuming I'm the only one).
   
   This is a bit awkward in the current model. We have a (very rudimentary) 
notion of cost modeling when doing predicate pushdown in the parquet scan when 
determining what order to evaluate the predicates. As you mention it's not 
quite ideal since by that point we have already split the conjunctions, but it 
has one advantage in that by that point you are dealing with a single file so 
you have more metadata to make a cost calculation (column chunk sizes, etc). Is 
the idea here that we bubble information about global sort indexes up to the 
logical `TableProvider` so we avoid splitting those two predicates when pushing 
down to the scan? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] thinkharderdev commented on pull request #5362: Add index interface method

Reply via email to