Josh Taylor created ARROW-10275: ----------------------------------- Summary: [Rust] [Datafusion] GROUP BY with a high cardinality doesn't seem to finish Key: ARROW-10275 URL: https://issues.apache.org/jira/browse/ARROW-10275 Project: Apache Arrow Issue Type: Bug Components: Rust - DataFusion Affects Versions: 2.0.0 Environment: Ubuntu 20.04 Reporter: Josh Taylor
Group by with a high cardinality (columns with lots of unique values) don't seem to finish. I've tried with both datafusion-cli and this: [https://github.com/joshuataylor/parquet-group-by/blob/main/src/main.rs] When doing O_ORDERKEY there are ~15 000 000 unique records, so it seems to stall. I've tried with limit but it doesn't work either. My parquet file: https://drive.google.com/file/d/1aCW7SW2rUVioSePduhgo_91F5-xDMyjp/view?usp=sharing -- This message was sent by Atlassian Jira (v8.3.4#803005)