Anand, In short, I think it's feasible, but I don't think it's simple. I also don't think Lucene should directly provide an interface to the format that says "Give me the graph". You could have a custom writer that does this however.
All formats are nominally based, so if your GPU merge format writes out the appropriate name and format, it should be readable. > One issue we have been running into is long build times with higher > dimensional vectors. Are you building the graph with a single thread? What vector dimensions are you using? As an aside, building the graph via quantized vectors can help speed things up. Though I understand the desire to do graph building with a GPU. Very interesting ideas indeed Anand. Ben On Fri, Jun 14, 2024 at 4:49 AM Anand Kotriwal <anand.kotri...@gmail.com> wrote: > > Hi all, > > We extensively use Lucene and HNSW graph search capability for ANN searches. > One issue we have been running into is long build times with higher > dimensional vectors. To address this, we are exploring ways where we can > build the hnsw index on the GPU and merge it into an existing Lucene index to > serve queries. For example, Nvidia's cuvs library supports building a CAGRA > index and transforming it into a hnswlib graph. > > My idea is - once the hnswgraph is built on the GPUs, we can import the > graph. We need the graph vertices and their connections. We can then write it > to a lucene compatible segment file format. We also map the docids to > embeddings and update the fieldinfos. > > I would like feedback from the community on whether this sounds feasible and > any implementation pointers you might have. > > > Thanks, > Anand Kotriwal --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org