zhuqi-lucas commented on PR #14766: URL: https://github.com/apache/datafusion/pull/14766#issuecomment-2667624674
1. The memory usage now is accurate, it will not collect all result to memory. 2. We now register datafusion-cli result batch to memory pool also. The testing result for the 10G memory case, now it's 5G peak memory: ```rust /usr/bin/time -l cargo run --release -- --mem-pool-type fair -m 5G --maxrows 10 -f '/Users/zhuqi/arrow-datafusion/benchmarks/data/external_sort.sql' Compiling datafusion-cli v45.0.0 (/Users/zhuqi/arrow-datafusion/datafusion-cli) Finished `release` profile [optimized] target(s) in 6m 06s Running `/Users/zhuqi/arrow-datafusion/target/release/datafusion-cli --mem-pool-type fair -m 5G --maxrows 10 -f /Users/zhuqi/arrow-datafusion/benchmarks/data/external_sort.sql` DataFusion CLI v45.0.0 0 row(s) fetched. Elapsed 0.006 seconds. memory pool: FairSpillPool { pool_size: 5368709120, state: Mutex { data: FairSpillPoolState { num_spill: 0, spillable: 0, unspillable: 0 } } } +------------+-----------+-----------+--------------+------------+-----------------+------------+-------+------------+--------------+---------------+ | l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_shipdate | l_commitdate | l_receiptdate | +------------+-----------+-----------+--------------+------------+-----------------+------------+-------+------------+--------------+---------------+ | 1 | 1551894 | 76910 | 1 | 17.00 | 33078.94 | 0.04 | 0.02 | 1996-03-13 | 1996-02-12 | 1996-03-22 | | 1 | 673091 | 73092 | 2 | 36.00 | 38306.16 | 0.09 | 0.06 | 1996-04-12 | 1996-02-28 | 1996-04-20 | | 1 | 636998 | 36999 | 3 | 8.00 | 15479.68 | 0.10 | 0.02 | 1996-01-29 | 1996-03-05 | 1996-01-31 | | 1 | 21315 | 46316 | 4 | 28.00 | 34616.68 | 0.09 | 0.06 | 1996-04-21 | 1996-03-30 | 1996-05-16 | | 1 | 240267 | 15274 | 5 | 24.00 | 28974.00 | 0.10 | 0.04 | 1996-03-30 | 1996-03-14 | 1996-04-01 | | 1 | 156345 | 6348 | 6 | 32.00 | 44842.88 | 0.07 | 0.02 | 1996-01-30 | 1996-02-07 | 1996-02-03 | | 2 | 1061698 | 11719 | 1 | 38.00 | 63066.32 | 0.00 | 0.05 | 1997-01-28 | 1997-01-14 | 1997-02-02 | | 3 | 42970 | 17971 | 1 | 45.00 | 86083.65 | 0.06 | 0.00 | 1994-02-02 | 1994-01-04 | 1994-02-23 | | 3 | 190355 | 65359 | 2 | 49.00 | 70822.15 | 0.10 | 0.00 | 1993-11-09 | 1993-12-20 | 1993-11-24 | | 3 | 1284483 | 34508 | 3 | 27.00 | 39620.34 | 0.06 | 0.07 | 1994-01-16 | 1993-11-22 | 1994-01-23 | | . | | . | | . | +------------+-----------+-----------+--------------+------------+-----------------+------------+-------+------------+--------------+---------------+ 81920 row(s) fetched. (First 10 displayed. Use --maxrows to adjust) Elapsed 2.165 seconds. 373.37 real 9.21 user 5.91 sys 5073829888 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 1293674 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 1906 voluntary context switches 85462 involuntary context switches 200845261488 instructions retired 55793294693 cycles elapsed 5072421856 peak memory footprint ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org