alamb commented on PR #9956:
URL: https://github.com/apache/arrow-rs/pull/9956#issuecomment-4769412416

   One thing @adriangb and @zhuqi-lucas  and I have noticed in DataFusion is 
that getting heuristics to work well is very challenging -- for example cutoff 
values often vary from architecture to architecture (e.g. is 32 contiguous 1s 
good, or should it be 64?)
   
   One thing we have been exploring is a more dynamic approach -- aka to switch 
the predicate evaluation strategy at certain times when the decoder naturally 
has to re-create some state, such as between row groups, like in this PR:
   - https://github.com/apache/arrow-rs/pull/10158
   
   It seems as if you have taken a similar approach in this PR 
   
   > Adds an adaptive post-filter cost model for row groups
   
   (caveat I have not had a chance to read this one carefully, and for that I 
apologize)
   
   I think we had been planning to put more of the adaptivity at a higher level 
(DataFusion specifically) as it has more information about things like 
statistics, and cross file predicate selectivity.
   
   I wonder if you have thought about where these auto adaptive decisions would 
best be made. 
   
   I do think the APIs you have outlined allow for both automatic and manually 
overriding (e.g. DataFusion could override the decisions made automatically) 
which is interesting


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to