The Vector API is in OpenJDK, so I think the licensing should be OK:
https://openjdk.org/jeps/508

The main problem is the fact it isn't a stable API yet, and it relies
on Valhalla. It would be a judgement call on how much we expect it to
change over time, and how difficult it would be to migrate things to
follow those changes. It would also be a bet that by the time
everything is done, these set of JDK features are more or less
stabilized.

Using FFI/JNI would be a more traditional way to go about it. FFI is
new and better than JNI, so if we choose to go with that, it should be
less painful. FFI is a preview feature, which is less risky than an
incubating feature.

There is also the JNA project, which wraps JNI to make it simpler:
https://github.com/java-native-access/jna . I'm assuming most of the
libraries we might want to use are mostly computational, so they
wouldn't have many platform-specific dependencies, just architecture
specific ones. I think it also handles the build aspect of it, which
FFI doesn't directly. Assuming the libraries we would want to use
aren't in libc or otherwise can't be assumed to be present, we would
have to include them in the jar somehow.


On Tue, Jun 10, 2025 at 8:27 AM Mike Carey <dtab...@gmail.com> wrote:
>
> Q:  Are there licensing gotchas with approach 1 (which otherwise sounds
> nicer from a maintenance standpoint)? We need to be sure that everything
> we use is Apache-okay in terms of licensing.  It would be fun to see
> some preliminary numbers on perf, e.g., for KNN, each way, were it as
> easy as changing which function(s) to call...  :-)  That would help
> quantify the two options (vs. each other and vs. none) too.
>
> On 6/10/25 7:24 AM, Calvin Dani wrote:
> > Hi,
> >
> > As part of adding vector functionality to AsterixDB, I have been exploring
> > possible optimizations for vector computations. One promising direction is
> > leveraging SIMD operations to accelerate these calculations. Although Java
> > offers autovectorization to utilize SIMD, this approach requires the
> > operations to be branchless (i.e., no conditional branching like if/else),
> > and it may not always be triggered when vector calculations get complex.
> >
> > I have considered two main options for SIMD-enabled vector computation:
> >
> > 1. Java Vector API: Introduced as an incubation feature since Java 17, the
> > Vector API is part of the long-term Project Valhalla. While it remains in
> > incubation and likely won’t be finalized until Project Valhalla completes,
> > the API already supports the basic operations needed for our distance
> > metrics, such as Euclidean Distance, Manhattan Distance, Cosine Similarity,
> > and Dot Product. It also provides a primitive Vector<E> type which could
> > serve as a native storage for embeddings.
> >
> > 2. Foreign Function & Memory API: This allows calling optimized C/C++
> > libraries directly from Java. We could either leverage existing
> > highly-optimized vector computation libraries or implement our own native
> > code. However, packaging and ensuring compatibility of native libraries
> > across different target platforms may introduce complexity.
> >
> > If you are aware of other solutions or have feedback on these options, I
> > would appreciate your insights.
> >
> > Thank you,
> > Calvin Dani
> >

Reply via email to