vibhatha commented on code in PR #13700:
URL: https://github.com/apache/arrow/pull/13700#discussion_r930155940


##########
cpp/src/arrow/compute/exec/options.h:
##########
@@ -430,6 +430,23 @@ class ARROW_EXPORT SelectKSinkNodeOptions : public 
SinkNodeOptions {
   SelectKOptions select_k_options;
 };
 
+/// \brief Make a node which selects a range of rows passed through it
+///
+/// All batches pushed to this node will be accumulated, then selected, by the 
given
+/// fields. Then sorted batches will be forwarded to the generator in sorted 
order and

Review Comment:
   TopK is giving the highest K values (when sorted in descending order) right?
   Fetch should be able to just pick sorted or unsorted data with an offset. So 
when offset=0, and it is sorted in descending order, we get TopK. I assume the 
Fetch node can be the most generalized node in that sense. 
   
   The main issue with this implementation is that, it is not optimized. There 
is another effort to optimize in such a way that it doesn't accumulate all the 
data in-memory when a Fetch operation is done with no sorting. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to