lidavidm commented on pull request #11019: URL: https://github.com/apache/arrow/pull/11019#issuecomment-914443252
> > > Another thought: isn't one attraction of TopK to have a streaming algorithm with O(k) memory consumption? Making a full sort requires to materialize the entire input, hence O(n) memory consumption. > > > > > > That is a fair point. However, this kernel as-is is not a streaming implementation and we would have to implement it as an actual aggregate kernel to get that behavior. > > I have a question related to that, the streaming implementation would be using ExecNodes right? It's unclear to me. The API for scalar aggregate functions expects you to handle parallelism and streaming, but doesn't handle larger-than-memory state. I would say it should just be a scalar aggregate function if we were to implement it right now, though, since we haven't really started thinking about larger-than-memory state. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
