To serialize some off-list discussions about this, it seemed like it should be possible to support the 3-ary vector_distance signature in a better way. The main thing to change is to make it so the last part is basically an enum that is evaluated at compile-time, sort of similar to some of our WITH clauses in DDLs. That way there can be no issues with mistakes like "vector_distance(v1,v2,"manhatan")" causing unintended behavior. This was one of my main qualms with this syntax style.
On Tue, Dec 2, 2025 at 1:12 PM Mike Carey <[email protected]> wrote: > > This makes sense to me; it invites fewer errors (e.g., unsupported or > misspelled metrics). We could always support the string-y syntax with a > thin logical rewrite if we wanted/need that syntax as well for any reason. > > On 12/2/25 12:40 PM, Ian Maxon wrote: > > Hi fellow devs, > > There is a nice patch up by Calvin > > (https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126) that adds a > > variety of distance functions for vectors. The initial patch added a > > function like this: > > vector_distance(u,v,metric) > > Which would compute the distance between two arrays of the same length > > containing numerics, according to a metric, which is given as a > > string. The string would be, for example "euclidean", or "manhattan". > > > > I wasn't particularly fond of this syntax- it was inspired from > > something else, and not a slight at Calvin's work. After discussing > > informally with Calvin and some others familiar with the patch, I > > changed the patch to instead add a separate function for each metric, > > like: > > euclidean_dist(u,v) > > manhattan_dist(u,v) > > ... > > and so on. To me it seems this fits better with the naming and syntax > > patterns we already have. Code-wise each function continues to share > > most of its implementation with the other vector distance calculation > > functions. > > > > Does anyone have any other thoughts or suggestions on the matter? > > > > -Ian
