2010YOUY01 commented on PR #23274: URL: https://github.com/apache/datafusion/pull/23274#issuecomment-4862286218
> @hhhizzz I reviewed the sorted path, I think `emit first block` may can't satisfy its demand. As I see maybe we should design specific `group values` for sorted. It can also make `sorted aggr` faster in theory. An alternative is also remove `EmitTo::First(k)`, and only support `EmitTo::FirstBlock` Conceptually, `GroupOrdering`, `GroupValues`, `GroupsAccumulator` are all chunked and aligned with block size, this block size must be configured during initialization. If we set it to 100, and at some point there are 280 groups accumulated so far, the internal layout is: ``` gorup_values: vec(100), vec(100), vec(100) accumulator: vec(100), vec(100), vec(100) ``` And `GroupOrdering` is aware it should only emit at the block size granularity. - If there are 120 finished groups so far, `group_ordering.emit_to()` returns `EmitTo::FirstBlock`, so it's safe to remove and output first block (vec of length 100) from all aligned `GroupValues`/`GroupsAccumulators` - If there are 80 finished groups, `group_ordering.emit_to()` returns `None`, indicating we can't early emit since the internal storage is block aligned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
