Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/108#discussion_r138024379
--- Diff: core/src/main/java/hivemall/tools/map/UDAFToOrderedMap.java ---
@@ -54,19 +68,35 @@ public GenericUDAFEvaluator
getEvaluator(GenericUDAFParameterInfo info)
"Only primitive type arguments are accepted for the key
but "
+ typeInfo[0].getTypeName() + " was passed as
parameter 1.");
}
+
boolean reverseOrder = false;
+ int size = 0;
if (typeInfo.length == 3) {
- if (HiveUtils.isBooleanTypeInfo(typeInfo[2]) == false) {
- throw new UDFArgumentTypeException(2, "The three argument
must be boolean type: "
- + typeInfo[2].getTypeName());
- }
ObjectInspector[] argOIs = info.getParameterObjectInspectors();
- reverseOrder = HiveUtils.getConstBoolean(argOIs[2]);
+ if (HiveUtils.isBooleanTypeInfo(typeInfo[2])) {
+ reverseOrder = HiveUtils.getConstBoolean(argOIs[2]);
+ } else if (HiveUtils.isIntegerTypeInfo(typeInfo[2])) {
+ size = HiveUtils.getConstInt(argOIs[2]);
+ if (size == 0) {
+ throw new UDFArgumentException("Map size must be
nonzero: " + size);
+ }
+ reverseOrder = (size > 0); // positive size => top-k
+ } else {
+ throw new UDFArgumentTypeException(2,
+ "The third argument must be boolean or integer type: "
+ + typeInfo[2].getTypeName());
+ }
}
- if (reverseOrder) {
+ if (reverseOrder) { // descending
--- End diff --
Better to implement `BoundedSortedMap` to avoid duplicate codes and memory
in-efficient top-k operation.
---