UBarney commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3078931663
> Thanks [@nuno-faria](https://github.com/nuno-faria) that's a great insight (for TPC-H / very nested joins we probably should implement a smarter join order algorithm). > > For h2o joins however, it seems it is only doing a single join (https://github.com/apache/datafusion/blob/main/benchmarks/queries/h2o/join.sql) so I would expect the order to be correct there? The join order seems to be correct. `benchmarks/results/join/h2o_join.json` is the result from running the original join.sql, while `benchmarks/results/join/h2o_join_exch_join_order.json` is the result after swapping all the join orders in `join.sql`. As can be seen, the latter is slower. <details> <summary>h2o_join_exch_join_order.sql</summary> ```sql SELECT x.id1, x.id2, x.id3, x.id4 as xid4, small.id4 as smallid4, x.id5, x.id6, x.v1, small.v2 FROM small INNER JOIN x ON x.id1 = small.id1; SELECT x.id1 as xid1, medium.id1 as mediumid1, x.id2, x.id3, x.id4 as xid4, medium.id4 as mediumid4, x.id5 as xid5, medium.id5 as mediumid5, x.id6, x.v1, medium.v2 FROM medium INNER JOIN x ON x.id2 = medium.id2; SELECT x.id1 as xid1, medium.id1 as mediumid1, x.id2, x.id3, x.id4 as xid4, medium.id4 as mediumid4, x.id5 as xid5, medium.id5 as mediumid5, x.id6, x.v1, medium.v2 FROM medium RIGHT JOIN x ON x.id2 = medium.id2; SELECT x.id1 as xid1, medium.id1 as mediumid1, x.id2, x.id3, x.id4 as xid4, medium.id4 as mediumid4, x.id5 as xid5, medium.id5 as mediumid5, x.id6, x.v1, medium.v2 FROM medium JOIN x ON x.id5 = medium.id5; SELECT x.id1 as xid1, large.id1 as largeid1, x.id2 as xid2, large.id2 as largeid2, x.id3, x.id4 as xid4, large.id4 as largeid4, x.id5 as xid5, large.id5 as largeid5, x.id6 as xid6, large.id6 as largeid6, x.v1, large.v2 FROM large JOIN x ON x.id3 = large.id3; ``` </details> ``` √ devhomeinsp ~/c/datafusion > python3 benchmarks/compare.py benchmarks/results/join/h2o_join.json benchmarks/results/join/h2o_join_exch_join_order.json ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓ ┃ Query ┃ join ┃ join ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩ │ QQuery 1 │ 1.77 ms │ 1.70 ms │ no change │ │ QQuery 2 │ 12.38 ms │ 12.90 ms │ no change │ │ QQuery 3 │ 12.23 ms │ 12.31 ms │ no change │ │ QQuery 4 │ 10.96 ms │ 12.17 ms │ 1.11x slower │ │ QQuery 5 │ 2206.66 ms │ 3882.40 ms │ 1.76x slower │ └──────────────┴────────────┴────────────┴──────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ Benchmark Summary ┃ ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩ │ Total Time (join) │ 2244.00ms │ │ Total Time (join) │ 3921.49ms │ │ Average Time (join) │ 448.80ms │ │ Average Time (join) │ 784.30ms │ │ Queries Faster │ 0 │ │ Queries Slower │ 2 │ │ Queries with No Change │ 3 │ │ Queries with Failure │ 0 │ └────────────────────────┴───────────┘ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org