edponce commented on a change in pull request #11019:
URL: https://github.com/apache/arrow/pull/11019#discussion_r700603134
##########
File path: cpp/src/arrow/compute/api_vector.cc
##########
@@ -140,6 +144,15 @@ PartitionNthOptions::PartitionNthOptions(int64_t pivot)
: FunctionOptions(internal::kPartitionNthOptionsType), pivot(pivot) {}
constexpr char PartitionNthOptions::kTypeName[];
+SelectKOptions::SelectKOptions(int64_t k, std::vector<std::string> keys,
std::string keep,
+ SortOrder order)
+ : FunctionOptions(internal::kSelectKOptionsType),
+ k(k),
+ keys(std::move(keys)),
+ keep(keep),
+ order(order) {}
+constexpr char SelectKOptions::kTypeName[];
+
Review comment:
The select K algorithm is a general approach to get the topK, bottomK,
or median statistic. It seems that SortOrder option is always `Descending` for
topK and `Ascending` for bottomK, so I recommend to use an enum for the type of
statistic desired instead of specifying ordering. As a user, if I specify
`Ascending` it is not intuitive that it corresponds to topK because it depends
from which side the sorted data is searched.
```
enum class SelectionOperator {
TOPK,
BOTTOMK,
MEDIAN, // possibly for another PR
};
```
Also, I am not sure that the`keys` and `keep` options are part of common
selectK APIs. I think that having a sorter data member which represents the
options for a sorting algorithm would be better structured.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]