pepijnve commented on PR #18183:
URL: https://github.com/apache/datafusion/pull/18183#issuecomment-3615601595
Looking at the benchmark results, a nice extension/followup to this work
might be to pattern match
```
CASE
WHEN <expr> == <literal> THEN <literal>
WHEN <expr> == <literal> THEN <literal>
....
WHEN <expr> == <literal> THEN <literal>
ELSE <literal>
END
```
and transform it to `CASE <expr> WHEN <literal> THEN <literal> ...` so that
the two become identical
```
case_when 8192x50: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1
== n THEN n ELSE n + 1 END 1.00 47.6±0.37ms
? ?/sec 1.01 48.1±1.09ms
case_when 8192x50: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n
ELSE n + 1 END 1.00 67.1±0.45µs
? ?/sec 787.97 52.9±0.57ms
```
A similar further followup I'm considering is applying the same "compile
lookup data structure" technique for patterns like
```
CASE
WHEN <expr> < <literal1> THEN <literal>
WHEN <expr> < <literal2> THEN <literal>
....
WHEN <expr> < <literaln> THEN <literal>
ELSE <literal>
END
```
assuming `<literal1> < <literal2> < ... < <literaln>` which could be quite
beneficial for histograms.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]