[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

Tomoko Uchida (Jira) Sat, 31 Oct 2020 23:57:02 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224226#comment-17224226
 ]


Tomoko Uchida commented on LUCENE-9583:
---------------------------------------

bq. I wonder if the graph approach really needs random access. For each node we 
need to access the list of neighbors so this pattern doesn't require random 
access because the list is already sorted by doc ids. So instead of adding 
another interface I wonder what do you think of adding a reset method in 
VectorValues ? For each node, the pattern to access would be to reset the 
iterator first and then move it to the first neighbor. We can make optimization 
internally to provide fast reset so that we don't need two implementations for 
the first two approaches that we foresee ?

I would prefer this approach, encouraging forward only iteration and calling 
reset() when needed, even if it could look a bit non-intuitive to implement 
graph based aknn search.
I feel exposing public APIs for "random" access pattern needs more careful 
decision and we should start from conservative ways (we already discussed about 
that several times and couldn't reach an agreement so far, according to my 
understanding).

> How should we expose VectorValues.RandomAccess?
> -----------------------------------------------
>
>                 Key: LUCENE-9583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9583
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael Sokolov
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} 
> sub-interface. [~jtibshirani] pointed out this is not needed by some 
> vector-indexing strategies which can operate solely using a forward-iterator 
> (it is needed by HNSW), and so in the interest of simplifying the public API 
> we should not expose this internal detail (which by the way surfaces internal 
> ordinals that are somewhat uninteresting outside the random access API).
> I looked into how to move this inside the HNSW-specific code and remembered 
> that we do also currently make use of the RA API when merging vector fields 
> over sorted indexes. Without it, we would need to load all vectors into RAM  
> while flushing/merging, as we currently do in 
> {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost 
> for the simpler API.
> Another thing I noticed while reviewing this is that I moved the KNN 
> {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}} 
>  to {{VectorValues.RandomAccess}}. This I think we could move back, and 
> handle the HNSW requirements for search elsewhere. I wonder if that would 
> alleviate the major concern here? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

Reply via email to