alamb commented on PR #21154: URL: https://github.com/apache/datafusion/pull/21154#issuecomment-4129981646
Locally, I was also able to reproduce about a 50% speedup Create 100 scale dataset ```shell tpchgen-cli --format parquet --scale-factor=100 --tables partsupp ``` main: ```sql > select ps_partkey, string_agg(ps_comment, ';') from 'partsupp.parquet' group by ps_partkey; 20000000 row(s) fetched. (First 40 displayed. Use --maxrows to adjust) Elapsed 10.798 seconds. ``` This branch ```shell andrewlamb@Andrews-MacBook-Pro-3:~/Downloads$ ./datafusion-cli-neilc_optimize-string-agg DataFusion CLI v52.3.0 ``` ```sql > select ps_partkey, string_agg(ps_comment, ';') from 'partsupp.parquet' group by ps_partkey; ... 20000000 row(s) fetched. (First 40 displayed. Use --maxrows to adjust) Elapsed 6.600 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
