cube2222 opened a new issue #2109: URL: https://github.com/apache/arrow-datafusion/issues/2109
**Describe the bug** I'm running benchmarks for [OctoSQL](github.com/cube2222/octosql) and datafusion-cli is one of the tools I compare against. The previous version I used (0.6.0 I think) did the benchmark in 1.5 second. The new version takes 100 (!!!) seconds. It also prints "0 rows in set", which makes me think this is a CSV decoder regression. This is based on the nyc yellow taxi dataset. **To Reproduce** ```bash curl https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-04.csv -o taxi.csv echo "CREATE EXTERNAL TABLE taxi STORED AS CSV WITH HEADER ROW LOCATION './taxi.csv'; SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count" > datafusion_commands.txt datafusion-cli -f datafusion_commands.txt ``` **Expected behavior** Datafusion is supposed to be blazingly fast. **Additional context** Add any other context about the problem here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
