ariel-miculas commented on PR #20823: URL: https://github.com/apache/datafusion/pull/20823#issuecomment-4202604980
I ran some tests with clickbench, reading from local files is worse: ``` [ec2-user@ip-172-31-0-185 datafusion]$ ./benchmarks/bench.sh compare json-test-on-main test-json-improvement Comparing json-test-on-main and test-json-improvement -------------------- Benchmark clickbench_2.json -------------------- ┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ json-test-on-main ┃ test-json-improvement ┃ Change ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 0 │ 2938.54 ms │ 36468.92 ms │ 12.41x slower │ │ QQuery 1 │ 4189.48 ms │ 36706.26 ms │ 8.76x slower │ │ QQuery 2 │ 3021.24 ms │ 36695.04 ms │ 12.15x slower │ │ QQuery 3 │ FAIL │ FAIL │ incomparable │ │ QQuery 4 │ 3518.24 ms │ 37016.08 ms │ 10.52x slower │ │ QQuery 5 │ 3138.41 ms │ 37131.63 ms │ 11.83x slower │ │ QQuery 6 │ FAIL │ FAIL │ incomparable │ │ QQuery 7 │ 4191.68 ms │ 36874.60 ms │ 8.80x slower │ │ QQuery 8 │ 4405.33 ms │ 37054.97 ms │ 8.41x slower │ │ QQuery 9 │ 3473.41 ms │ 37308.28 ms │ 10.74x slower │ │ QQuery 10 │ 4351.06 ms │ 36934.39 ms │ 8.49x slower │ │ QQuery 11 │ 3306.45 ms │ 37101.39 ms │ 11.22x slower │ │ QQuery 12 │ 3226.21 ms │ 37235.60 ms │ 11.54x slower │ │ QQuery 13 │ 3970.11 ms │ 37244.27 ms │ 9.38x slower │ │ QQuery 14 │ 3246.59 ms │ 37085.69 ms │ 11.42x slower │ │ QQuery 15 │ 4563.53 ms │ 37182.89 ms │ 8.15x slower │ │ QQuery 16 │ 4506.85 ms │ 37391.07 ms │ 8.30x slower │ │ QQuery 17 │ 4377.16 ms │ 37381.49 ms │ 8.54x slower │ │ QQuery 18 │ 3555.18 ms │ 37603.25 ms │ 10.58x slower │ │ QQuery 19 │ 4568.01 ms │ 36996.50 ms │ 8.10x slower │ │ QQuery 20 │ 3193.87 ms │ 37069.19 ms │ 11.61x slower │ │ QQuery 21 │ 4415.33 ms │ 37185.73 ms │ 8.42x slower │ │ QQuery 22 │ 3312.73 ms │ 37190.81 ms │ 11.23x slower │ │ QQuery 23 │ FAIL │ FAIL │ incomparable │ │ QQuery 24 │ 4382.53 ms │ 37093.81 ms │ 8.46x slower │ │ QQuery 25 │ 4339.69 ms │ 37121.90 ms │ 8.55x slower │ │ QQuery 26 │ 4425.42 ms │ 37106.02 ms │ 8.38x slower │ │ QQuery 27 │ 4505.30 ms │ 37059.04 ms │ 8.23x slower │ │ QQuery 28 │ 3582.82 ms │ 37409.12 ms │ 10.44x slower │ │ QQuery 29 │ 4440.96 ms │ 36868.93 ms │ 8.30x slower │ │ QQuery 30 │ 4675.71 ms │ 37081.23 ms │ 7.93x slower │ │ QQuery 31 │ 4276.55 ms │ 37165.64 ms │ 8.69x slower │ │ QQuery 32 │ 3615.42 ms │ 37662.39 ms │ 10.42x slower │ │ QQuery 33 │ 4446.09 ms │ 37558.30 ms │ 8.45x slower │ │ QQuery 34 │ 4521.66 ms │ 37647.72 ms │ 8.33x slower │ │ QQuery 35 │ 4321.41 ms │ 37225.06 ms │ 8.61x slower │ │ QQuery 36 │ FAIL │ FAIL │ incomparable │ │ QQuery 37 │ FAIL │ FAIL │ incomparable │ │ QQuery 38 │ FAIL │ FAIL │ incomparable │ │ QQuery 39 │ FAIL │ FAIL │ incomparable │ │ QQuery 40 │ FAIL │ FAIL │ incomparable │ │ QQuery 41 │ FAIL │ FAIL │ incomparable │ │ QQuery 42 │ FAIL │ FAIL │ incomparable │ └───────────┴───────────────────┴───────────────────────┴───────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓ ┃ Benchmark Summary ┃ ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩ │ Total Time (json-test-on-main) │ 131002.95ms │ │ Total Time (test-json-improvement) │ 1225857.24ms │ │ Average Time (json-test-on-main) │ 3969.79ms │ │ Average Time (test-json-improvement) │ 37147.19ms │ │ Queries Faster │ 0 │ │ Queries Slower │ 33 │ │ Queries with No Change │ 0 │ │ Queries with Failure │ 10 │ └──────────────────────────────────────┴──────────────┘ ``` The issue is the `into_stream` function of objects store's `get_result` reads data in 8KiB chunks for local files, so we need to either replace it with custom code or use a completely separate path for local files, as it was done previously -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
