Re: [PR] perf: optimize CASE WHEN lookup table (2.5-22.5 times faster) [datafusion]

via GitHub Thu, 04 Dec 2025 23:23:50 -0800


pepijnve commented on PR #18183:
URL: https://github.com/apache/datafusion/pull/18183#issuecomment-3615601595


   Looking at the benchmark results, a nice extension/followup to this work 
might be to pattern match
   ```
   CASE
       WHEN <expr> == <literal> THEN <literal>
       WHEN <expr> == <literal> THEN <literal>
       ....
       WHEN <expr> == <literal> THEN <literal>
       ELSE <literal>
   END
   ```
   
   and transform it to `CASE <expr> WHEN <literal> THEN <literal> ...` so that 
the two become identical
   
   ```
   case_when 8192x50: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 
== n THEN n ELSE n + 1 END                            1.00     47.6±0.37ms      
  ? ?/sec        1.01     48.1±1.09ms
   case_when 8192x50: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n 
ELSE n + 1 END                                           1.00     67.1±0.45µs   
     ? ?/sec        787.97    52.9±0.57ms
   ```
   
   A similar further followup I'm considering is applying the same "compile 
lookup data structure" technique for patterns like
   
   ```
   CASE
       WHEN <expr> < <literal1> THEN <literal>
       WHEN <expr> < <literal2> THEN <literal>
       ....
       WHEN <expr> < <literaln> THEN <literal>
       ELSE <literal>
   END
   ```
   
   assuming `<literal1> < <literal2> < ... < <literaln>` which could be quite 
beneficial for histograms.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: optimize CASE WHEN lookup table (2.5-22.5 times faster) [datafusion]

Reply via email to