pepijnve opened a new issue, #18496:
URL: https://github.com/apache/datafusion/issues/18496

   ### Is your feature request related to a problem or challenge?
   
   Histograms in SQL can be computed fairly easily using `case` and `order by`.
   
   An example of this kind of query is:
   ```
   select 
       case 
           when salary between 75000 and 90000 then '75000-90000' 
           when salary between 90000 and 120000 then '90000-120000'
           else '120000+'
       end as salary_band, 
       count(*) 
   from employee_salary
   group by 1
   ```
   
   The default evaluation logic (which is logically equivalent to a chain of 
`if/else` statements) does not leverage the fact that the ranges being tested 
here are static and disjunct. In the example above it would be more optimal to 
compute the boundaries as an array `[75000, 90000, 120000]` and then try to 
find the appropriate bucket index by scanning that array.
   
   It would be useful to have a dedicated case evaluation code path for this 
that does this more optimally. This is similar in idea to the work that's being 
done in https://github.com/apache/datafusion/pull/18183 for equality checks.
   
   ### Describe the solution you'd like
   
   In the `case` physical expression implementation, analyse the when 
conditions. If they are all range tests for the same input value using literals 
for the range boundaries and the order of the when expressions permits it, 
extract the range boundaries to an array and test the input value against the 
array to determine the appropriate then expression index.
   
   If the then expressions are all literals, then there's also no need to 
actually evaluate those. Instead a lookup table can be used to map each row to 
the appropriate output value.
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to