Dandandan commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2974721937
This is for me ```diff --- i/datafusion/physical-plan/src/coalesce/mod.rs +++ w/datafusion/physical-plan/src/coalesce/mod.rs @@ -33,6 +33,9 @@ pub struct LimitedBatchCoalescer { fetch: Option<usize>, /// Indicates if the coalescer is finished finished: bool, + + /// target batch size + target_batch_size: usize, } impl LimitedBatchCoalescer { @@ -53,6 +56,7 @@ impl LimitedBatchCoalescer { total_rows: 0, fetch, finished: false, + target_batch_size, } } @@ -92,9 +96,15 @@ impl LimitedBatchCoalescer { } } + let num_rows = batch.num_rows(); + self.total_rows += batch.num_rows(); self.inner.push_batch(batch)?; + if num_rows > self.target_batch_size / 2 { + self.inner.finish_buffered_batch()?; + } + Ok(false) // not at limit } ``` Looks like some queries benefit from this: ``` ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ upstream_coalesce ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 1 │ 62.59 ms │ 47.70 ms │ +1.31x faster │ │ QQuery 2 │ 12.52 ms │ 12.70 ms │ no change │ │ QQuery 3 │ 20.40 ms │ 20.52 ms │ no change │ │ QQuery 4 │ 13.52 ms │ 12.76 ms │ +1.06x faster │ │ QQuery 5 │ 31.40 ms │ 30.72 ms │ no change │ │ QQuery 6 │ 11.01 ms │ 11.13 ms │ no change │ │ QQuery 7 │ 67.99 ms │ 68.61 ms │ no change │ │ QQuery 8 │ 17.07 ms │ 16.47 ms │ no change │ │ QQuery 9 │ 38.69 ms │ 35.99 ms │ +1.08x faster │ │ QQuery 10 │ 32.05 ms │ 28.81 ms │ +1.11x faster │ │ QQuery 11 │ 5.80 ms │ 5.78 ms │ no change │ │ QQuery 12 │ 28.83 ms │ 27.93 ms │ no change │ │ QQuery 13 │ 18.29 ms │ 17.37 ms │ +1.05x faster │ │ QQuery 14 │ 5.24 ms │ 5.68 ms │ 1.08x slower │ │ QQuery 15 │ 10.88 ms │ 11.41 ms │ no change │ │ QQuery 16 │ 13.53 ms │ 12.77 ms │ +1.06x faster │ │ QQuery 17 │ 54.94 ms │ 62.53 ms │ 1.14x slower │ │ QQuery 18 │ 118.23 ms │ 115.08 ms │ no change │ │ QQuery 19 │ 21.28 ms │ 20.00 ms │ +1.06x faster │ │ QQuery 20 │ 20.43 ms │ 19.09 ms │ +1.07x faster │ │ QQuery 21 │ 84.41 ms │ 82.41 ms │ no change │ │ QQuery 22 │ 12.25 ms │ 10.74 ms │ +1.14x faster │ └──────────────┴───────────┴───────────────────┴───────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ Benchmark Summary ┃ ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩ │ Total Time (main) │ 701.36ms │ │ Total Time (upstream_coalesce) │ 676.22ms │ │ Average Time (main) │ 31.88ms │ │ Average Time (upstream_coalesce) │ 30.74ms │ │ Queries Faster │ 9 │ │ Queries Slower │ 2 │ │ Queries with No Change │ 11 │ │ Queries with Failure │ 0 │ └──────────────────────────────────┴──────────┘ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org