zhuqi-lucas commented on PR #16804: URL: https://github.com/apache/datafusion/pull/16804#issuecomment-3082603389
Updated, it works now, the falsa has merged the fix and released: https://github.com/mrpowers-io/falsa/pull/28 ```rust ./bench.sh data h2o_small_join_parquet *************************** DataFusion Benchmark Runner and Data Generator COMMAND: data BENCHMARK: h2o_small_join_parquet DATA_DIR: /Users/zhuqi/arrow-datafusion/benchmarks/data CARGO_COMMAND: cargo run --release PREFER_HASH_JOIN: true *************************** Found Python version 3.13, which is suitable. Using Python command: /opt/homebrew/bin/python3 Installing falsa... Generating h2o test data in /Users/zhuqi/arrow-datafusion/benchmarks/data/h2o with size=SMALL and format=PARQUET 10 rows will be saved into: /Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e7_1e1_0.parquet 10000 rows will be saved into: /Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e7_1e4_0.parquet 10000000 rows will be saved into: /Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e7_1e7_NA.parquet An SMALL data schema is the following: id1: int64 not null id4: string not null v2: double not null An output format is PARQUET Batch mode is supported. In case of memory problems you can try to reduce a batch_size. Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 An MEDIUM data schema is the following: id1: int64 not null id2: int64 not null id4: string not null id5: string not null v2: double not null An output format is PARQUET Batch mode is supported. In case of memory problems you can try to reduce a batch_size. Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 An BIG data schema is the following: id1: int64 not null id2: int64 not null id3: int64 not null id4: string not null id5: string not null id6: string not null v2: double not null An output format is PARQUET Batch mode is supported. In case of memory problems you can try to reduce a batch_size. Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02 An LSH data schema is the following: id1: int64 not null id2: int64 not null id3: int64 not null id4: string not null id5: string not null id6: string not null v1: double not null An output format is PARQUET Batch mode is supported. In case of memory problems you can try to reduce a batch_size. Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org