Dandandan commented on issue #1363: URL: https://github.com/apache/arrow-datafusion/issues/1363#issuecomment-979899560
@rdettai No problem - just want to figure out what could be the reason :)! So far I tested: * master - performance regression * 2454e468641d4d98af211c2800c0afec2732385b - regression * d2d47d38b8c1b4605272d7f917406527cdf68bc9 - fast again > after #1347 was merged For TPCH I remember collecting stats doesn't have a big effect now, as data is very evenly distributed, and I think also doesn't take a long time to collect those. To reproduce: * Create partitioned Parquet dataset (slowness seems to increase with nr of partitions? - not 100% sure yet) * Run some queries `cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path [data] --format parquet --query 6 --batch-size 8192 -p 16` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
