milenkovicm commented on issue #7858: URL: https://github.com/apache/arrow-datafusion/issues/7858#issuecomment-1770357507
Thanks @kazuyukitanimura , @alamb for your comments, I guess there is no perfect way to address 1. Another alternative for 1 might be to sort data in `GroupValuesRows`. We could crate a `SortedGroupValuesRows` which would have `BTreeMap<GroupingKeyBlob, offset>` instead of https://github.com/apache/arrow-datafusion/blob/b6e4c8238031ae17373d9ae3be2def4b57645e42/datafusion/physical-plan/src/aggregates/group_values/row.rs#L47 this would make state sorted by key, thus no need for `RecordBatch` sorting later. There should be conversion from `GroupValuesRows` to `SortedGroupValuesRows` which would involve copying just a single table entry from Map to BTree. This conversion (sorting) would be performed on the first spill, `SortedGroupValuesRows` would then converted to a batch and spilled. Once the first spill has been done AggregateExec should continue using `SortedGroupValuesRows` for its state, as we know at that points that keys needed to be sortedfor eventual spill or merge. I get your idea regarding point 3, @kazuyukitanimura. I've tried to tune memory and batch size, what I observed that once the system is in border area this `if` will fail execution. It is quite indeterministic when it's going to fail. I agree that spill should not produce a lot of small file, but in practice I believe small spill would be just a temporary, until one of the bigger consumer spills their state, and they are as well under memory pressure. Thus, believe removing this particular checks would improve robustness. wdyt? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
