To serialize some off-list discussions about this, it seemed like it
should be possible to support the 3-ary vector_distance signature in a
better way. The main thing to change is to make it so the last part is
basically an enum that is evaluated at compile-time, sort of similar
to some of our WITH clauses in DDLs. That way there can be no issues
with mistakes like "vector_distance(v1,v2,"manhatan")" causing
unintended behavior. This was one of my main qualms with this syntax
style.


On Tue, Dec 2, 2025 at 1:12 PM Mike Carey <[email protected]> wrote:
>
> This makes sense to me; it invites fewer errors (e.g., unsupported or
> misspelled metrics).  We could always support the string-y syntax with a
> thin logical rewrite if we wanted/need that syntax as well for any reason.
>
> On 12/2/25 12:40 PM, Ian Maxon wrote:
> > Hi fellow devs,
> > There is a nice patch up by Calvin
> > (https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20126) that adds a
> > variety of distance functions for vectors. The initial patch added a
> > function like this:
> > vector_distance(u,v,metric)
> > Which would compute the distance between two arrays of the same length
> > containing numerics, according to a metric, which is given as a
> > string. The string would be, for example "euclidean", or "manhattan".
> >
> > I wasn't particularly fond of this syntax- it was inspired from
> > something else, and not a slight at Calvin's work. After discussing
> > informally with Calvin and some others familiar with the patch, I
> > changed the patch to instead add a separate function for each metric,
> > like:
> > euclidean_dist(u,v)
> > manhattan_dist(u,v)
> > ...
> > and so on.  To me it seems this fits better with the naming and syntax
> > patterns we already have. Code-wise each function continues to share
> > most of its implementation with the other vector distance calculation
> > functions.
> >
> > Does anyone have any other thoughts or suggestions on the matter?
> >
> > -Ian

Reply via email to