acking-you commented on issue #11212:
URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2205396425

   > > possibility that left might already be false and op is And, or that left 
might be true and op is Or.
   > 
   > In general I think there is a tradeoff between doing short circuiting 
(what I think this ticket is describing) and having to check if each row should 
be evaluted
   > 
   > So for a predicate like `(a = 5) AND (b = 10)` it is very likely faster to 
evaluate `(a = b)` and (`b = 10`) with very tight loops / SIMD instructions and 
then `&&` the resutlting boolean together that it would be to evalute (`a = b`) 
and then have a loop that checked the result for each row when evaluting `b = 
10`.
   > 
   > However, for an example like `(a = 5) AND (very_expensive_udf(b) = 10)` it 
may well be faster to do short circuting execution
   > 
   > Note DataFusion already does short circuting for evaluating `CASE`
   > 
   > 
https://github.com/apache/datafusion/blob/4f4cd81de72a858896ac37a51b0e354cb379307c/datafusion/physical-expr/src/expressions/case.rs#L125-L187
   > 
   > SO TLDR I think this would be an interesting optimization to explore and 
as @Dandandan notes finding some benchmarks where it matters is likely a good 
first step
   
   Adding short-circuiting after the left calculation alone resulted in 
performance improvements in my local TPCH tests.
   [commit 
link](https://github.com/acking-you/arrow-datafusion/commit/fda9a6c2c092d10eca114144e49a710fbda4d68b)
   
   Here are the test results for my local TPCH run (on my M3Pro chip with 36GB 
of RAM):
   ```
   Comparing main and add_short_circuit
   --------------------
   Benchmark tpch_sf1.json
   --------------------
   ┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ Query        ┃     main ┃ add_short_circuit ┃        Change ┃
   ┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
   │ QQuery 1     │  89.54ms │           90.81ms │     no change │
   │ QQuery 2     │  24.09ms │           20.56ms │ +1.17x faster │
   │ QQuery 3     │  37.53ms │           38.00ms │     no change │
   │ QQuery 4     │  23.99ms │           23.73ms │     no change │
   │ QQuery 5     │  61.97ms │           54.52ms │ +1.14x faster │
   │ QQuery 6     │  21.62ms │           20.98ms │     no change │
   │ QQuery 7     │  75.46ms │           75.51ms │     no change │
   │ QQuery 8     │  49.76ms │           50.09ms │     no change │
   │ QQuery 9     │  68.53ms │           68.13ms │     no change │
   │ QQuery 10    │  61.78ms │           60.36ms │     no change │
   │ QQuery 11    │  13.16ms │           13.55ms │     no change │
   │ QQuery 12    │  34.62ms │           34.35ms │     no change │
   │ QQuery 13    │  41.50ms │           42.54ms │     no change │
   │ QQuery 14    │  32.07ms │           30.29ms │ +1.06x faster │
   │ QQuery 15    │  43.47ms │           43.82ms │     no change │
   │ QQuery 16    │  18.59ms │           16.59ms │ +1.12x faster │
   │ QQuery 17    │  97.63ms │           95.67ms │     no change │
   │ QQuery 18    │ 115.07ms │          111.64ms │     no change │
   │ QQuery 19    │  53.83ms │           57.37ms │  1.07x slower │
   │ QQuery 20    │  45.42ms │           44.09ms │     no change │
   │ QQuery 21    │  82.97ms │           82.69ms │     no change │
   │ QQuery 22    │  16.46ms │           13.59ms │ +1.21x faster │
   └──────────────┴──────────┴───────────────────┴───────────────┘
   ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
   ┃ Benchmark Summary                ┃           ┃
   ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
   │ Total Time (main)                │ 1109.06ms │
   │ Total Time (add_short_circuit)   │ 1088.88ms │
   │ Average Time (main)              │   50.41ms │
   │ Average Time (add_short_circuit) │   49.49ms │
   │ Queries Faster                   │         5 │
   │ Queries Slower                   │         1 │
   │ Queries with No Change           │        16 │
   └──────────────────────────────────┴───────────┘
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to