pjmore opened a new pull request #278:
URL: https://github.com/apache/arrow-datafusion/pull/278


   Added short circuiting extreme values check on list.
   
   # Which issue does this PR close?
   
   fixes #145 
   
   # What changes are included in this PR?
   Various changes to the InList physical expression.
   
   Macro based solution was replaced with generic functions, which ended up 
being a little symbol soupy. If the macro based solution is preferred I'd be 
happy to switch it to use a macro instead. 
   
   List of optimizations:
   - Using a HashSet to check for values in the list for large lists
   - For numeric lists sort the list of values:
       - Check that value falls within extreme values of list to enable 
       - Short circuits the linear search
   - Explicitly lift check for negated and contains_null to outside of the 
loop. It probably doesn't matter and if it did the compiler probably already 
does this,  but I thought it made sense to be explicit. 
   
   The thresholds for selecting whether to do a scan or use a HashSet could  be 
tuned more.
   
   # Are there any user-facing changes?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to