From Ali Alsuliman <[email protected]>: Attention is currently required from: Calvin Thomas Dani, Ian Maxon.
Ali Alsuliman has posted comments on this change by Calvin Thomas Dani. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126?usp=email ) Change subject: [ASTERIXDB-3676] Vector Distance KNN Function ...................................................................... Patch Set 19: (17 comments) File asterixdb/asterix-lang-common/src/main/java/org/apache/asterix/lang/common/util/CommonFunctionMapUtil.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/969b42c3_461e403d?usp=email : PS19, Line 174: addFunctionMapping("euclidean_dist", "euclidean-dist"); : addFunctionMapping("manhattan_dist", "manhattan-dist"); : addFunctionMapping("dot_dist", "dot-dist"); : addFunctionMapping("cosine_dist", "cosine-dist"); I don't think you need to add the ones having only '-' and '_' differences. They should be handled by some other code. Check and let me know. File asterixdb/asterix-om/src/main/java/org/apache/asterix/om/types/ATypeTag.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/ce9f9f2b_84339f97?usp=email : PS19, Line 138: public boolean isNumericType() { You can use ATypeHierarchy.getTypeDomain) == ATypeHierarchy.Domain.NUMERIC File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/vector/CosineDistanceArrDescriptor.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/a38a9267_8470a031?usp=email : PS19, Line 37: DescriptorFactoryUtil : .createFactory(CosineDistanceArrDescriptor::new, FunctionTypeInferers.SET_ARGUMENTS_TYPE); Typically the descriptors that implement type inferer also implement `public void setImmutableStates` to capture the arguments types and then pass them to the function evaluator. I don't see you need the arguments types. If not, then you can just simplify the descriptor factory as: ``` public final static IFunctionDescriptorFactory FACTORY = CosineDistanceArrDescriptor:new; ``` File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/vector/VectorDistanceArrScalarEvaluator.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/807501e7_af9904bb?usp=email : PS19, Line 63: VectorDistanceArrScalarEvaluator For all the instance variables, start by making them `private final` and then open up when needed. Also organize a bit so that they are next to each and interface declarations are next to each other. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/5da6263f_a876ce49?usp=email : PS19, Line 91: listAccessorConstant I don't think you need two separate listAccessor & listAccessorConstant. You should be able to do it using only listAccessor where you create list accessors beforehand and then just re-use those objects. Another note is you can also just simplify it by creating two variables instead of an array. This note is applicable to all the other array[2] variables and then you get rid of the for-loops and instead just refactor the code so that you can call it for arg1 and arg2. Give this a try and see how it looks. The goal is to make the code simpler. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/ec309be2_94ce4175?usp=email : PS19, Line 113: if (!ATYPETAGDESERIALIZER.deserialize(pointables[i].getByteArray()[pointables[i].getStartOffset()]) : .isListType()) Does this mean it is going to throw an exception if you pass a NULL/MISSING constant arg? e.g. `euclidean(arg1, NULL);` https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/d379f6fd_01089963?usp=email : PS19, Line 140: listAccessor[i] = new ListAccessor(); As I said, for variable arg, we should not create this list accessor each time we evaluate a tuple. You should be able to re-use the two list accessors that you created beforehand and just call `listAccessor.reset()`. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/76b9a80c_2d6e203c?usp=email : PS19, Line 160: PointableHelper.setNull(result); We typically return MISSING if there is any arg MISSING, then NULL if there is any arg NULL which is taken care of by `checkAndSetMissingOrNull()`. Are you purposefully returning always NULL? https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/ff30fc10_ad09dd99?usp=email : PS19, Line 174: double[] primitiveArrayLeft = : isConstant[0] ? primitiveArrayConstant[0] : createArrayFromList(leftListAccessor, vectorArrays[0]); : double[] primitiveArrayRight = : isConstant[1] ? primitiveArrayConstant[1] : createArrayFromList(rightListAccessor, vectorArrays[1]); We need to handle the less common case where each evaluated tuple has different arguments length than the previous tuple, e.g.: ``` tuple1: arg1=[1,2,3], arg2=[1,2,3] tuple2: arg1=[1,2,3,4], arg2=[1,2,3,4] ``` I believe this is less common, right? If yes, then we can handle it simply by resetting the double[] variables to a new double[] with the new length. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/18dcd68e_111e5a0c?usp=email : PS19, Line 190: result.set(resultStorage); You can move this `result.set(resultStorage);` up inside the try-block after `writeResult(distanceCal, dataOutput);` https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/7ad92839_cea3ac4a?usp=email : PS19, Line 194: AMutableDouble aDouble = new AMutableDouble(-1); Same idea. You should be able to create this object only once beforehand and re-use it here instead of creating one for each evaluated tuple in the `evaluate()` https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/d25f04f8_c02d9516?usp=email : PS19, Line 203: IPointable tempVal = new VoidPointable(); : ArrayBackedValueStorage storage = new ArrayBackedValueStorage(); Same idea. You should be able to create those objects only once beforehand and re-use them here instead of creating them for each evaluated tuple in the `evaluate()` https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/28cb07a1_d2ae51b8?usp=email : PS19, Line 212: extractNumericVector I believe that we can simplify it to something like: ``` byte[] data = pointable.getByteArray(); int offset = pointable.getStartOffset(); return switch (data[offset]) { case SERIALIZED_INT8_TYPE_TAG -> AInt8SerializerDeserializer.getByte(data, offset + 1); case SERIALIZED_INT16_TYPE_TAG -> AInt16SerializerDeserializer.getShort(data, offset + 1); case SERIALIZED_INT32_TYPE_TAG -> AInt32SerializerDeserializer.getInt(data, offset + 1); case SERIALIZED_INT64_TYPE_TAG -> AInt64SerializerDeserializer.getLong(data, offset + 1); case SERIALIZED_FLOAT_TYPE_TAG -> AFloatSerializerDeserializer.getFloat(data, offset + 1); case SERIALIZED_DOUBLE_TYPE_TAG -> ADoubleSerializerDeserializer.getDouble(data, offset + 1); default -> Double.NaN; }; ``` https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/a3b8e231_5d38eb64?usp=email : PS19, Line 221: DISTANCE_FN_MAP.get(func) Why not use the funcId directly and remove this map? File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/utils/VectorDistanceArrCalculation.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/674521ea_4a01c89a?usp=email : PS19, Line 28: sum += diff * diff; Is there a possibility that this could overflow/underflow? The same question for the other methods here. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/464cd9ba_0f919c8b?usp=email : PS19, Line 30: if (Double.isNaN(sum)) { This check is redundant. Math.sqrt(sum) already returns NaN if sum is NaN. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126/comment/c2e0140b_970d50b4?usp=email : PS19, Line 37: double sum = 0.0; : for (int i = 0; i < a.length; i++) { : double diff = a[i] - b[i]; : sum += diff * diff; : } : if (Double.isNaN(sum)) { : return Double.NaN; // Handle NaN case : } This part could be refactored with euclidean() -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126?usp=email To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings?usp=email Gerrit-MessageType: comment Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Change-Id: I45176e7da13ed222f48bb36bd1f0ab4074a0bcb9 Gerrit-Change-Number: 20126 Gerrit-PatchSet: 19 Gerrit-Owner: Calvin Thomas Dani <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Attention: Calvin Thomas Dani <[email protected]> Gerrit-Attention: Ian Maxon <[email protected]> Gerrit-Comment-Date: Wed, 18 Feb 2026 13:41:34 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: No
