[GitHub] [arrow] lidavidm commented on pull request #11019: ARROW-1565: [C++] Implement TopK/BottomK

GitBox Tue, 07 Sep 2021 09:17:00 -0700


lidavidm commented on pull request #11019:
URL: https://github.com/apache/arrow/pull/11019#issuecomment-914443252



   > > > Another thought: isn't one attraction of TopK to have a streaming 
algorithm with O(k) memory consumption? Making a full sort requires to 
materialize the entire input, hence O(n) memory consumption.
   > > 
   > > 
   > > That is a fair point. However, this kernel as-is is not a streaming 
implementation and we would have to implement it as an actual aggregate kernel 
to get that behavior.
   > 
   > I have a question related to that, the streaming implementation would be 
using ExecNodes right?
   
   It's unclear to me. The API for scalar aggregate functions expects you to 
handle parallelism and streaming, but doesn't handle larger-than-memory state. 
I would say it should just be a scalar aggregate function if we were to 
implement it right now, though, since we haven't really started thinking about 
larger-than-memory state.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lidavidm commented on pull request #11019: ARROW-1565: [C++] Implement TopK/BottomK

Reply via email to