Rachelint commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2287897129
@2010YOUY01 make sense, it seems `emit_early_if_necessary` function is actually introduced in the spilling pr #7400. I am checking the related codes about memory control, too. But the `FirstBlocks` is just the blocked version `First`, because it is too expansive to emit exact first n groups in blocked impls. For example: - `block size = 4`, `blocks=2`, and emit `first 3 groups`, - After emitting, `groups in first block = 1`, `groups in second block = 4` - And for keep the logic correct, we need to move groups to fill the first blocks to make: `groups in first block = 4`, `groups in second block = 1` Actually we can remove the `FirstBlocks` and `AllBlocks` mode, and we impl the related logics just out of `GroupValues` and `GroupAccumultor`. For example, in `emit_early_if_necessary`, we check if the `GroupValues` and `GroupAccumultor` are in blocked modes first. If so, we get the `block_size`, do the aligned work, and finally pass the aligned results in `First(aligned)` to them But I think if we impl like this, it will be so confused, and for making it clear, I introduce the two new blocked emission mode `FirstBlocks` and `AllBlocks`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org