vibhatha commented on code in PR #13700: URL: https://github.com/apache/arrow/pull/13700#discussion_r930155940
########## cpp/src/arrow/compute/exec/options.h: ########## @@ -430,6 +430,23 @@ class ARROW_EXPORT SelectKSinkNodeOptions : public SinkNodeOptions { SelectKOptions select_k_options; }; +/// \brief Make a node which selects a range of rows passed through it +/// +/// All batches pushed to this node will be accumulated, then selected, by the given +/// fields. Then sorted batches will be forwarded to the generator in sorted order and Review Comment: TopK is giving the highest K values (when sorted in descending order) right? Fetch should be able to just pick sorted or unsorted data with an offset. So when offset=0, and it is sorted in descending order, we get TopK. I assume the Fetch node can be the most generalized node in that sense. The main issue with this implementation is that, it is not optimized. There is another effort to optimize in such a way that it doesn't accumulate all the data in-memory when a Fetch operation is done with no sorting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org