[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401172#comment-17401172
 ] 

Mayya Sharipova edited comment on LUCENE-10054 at 8/20/21, 7:39 PM:
--------------------------------------------------------------------

Current .vem index file structure:

 
{code:java}
+-------------+--------+----------+----------+----------+---------+-------
| FieldNumber | SimFun | VDOffset | VDLength | VIOffset | VILength| dims  
+-------------+--------+----------+----------+----------+---------+-------

+------+--------+--------------+
| size | docIds | graphOffsets | 
+------+--------+--------------+
{code}
 
 * Field Number: a number of the filed
 * SimFun: an ordinal similarity function
 * VDOffset:  an offset in the vector data file (.vec file), where the original 
vector values are stored
 * VDLength: the length of vector data for this field
 * VIOffset: an offset int the vector index file (.vex file), where the node's 
connections are stored
 * VILength: the length of vector index for the this field
 * dims: vector field's dimensions
 * size: the total number of documents with this vector field
 * docIDs: ids of documents with this vector field
 * graphOffsets: for each document's vector its offsets in .vex file where its 
connections are stored

 

Proposed .vem index file structure:

 
{code:java}
+-------------+--------+----------+----------+----------+---------+-------
| FieldNumber | SimFun | VDOffset | VDLength | VIOffset | VILength| dims  
+-------------+--------+----------+----------+----------+---------+-------

+-------------+-----------+-----+-------------+--------+
| LevelsCount | SizeLevel0| ... | SizeLevelmax| docIds 
+-------------+-----------+-----+-------------+--------+

---+------------+-----+--------------+
ep | NodesLevel1| ... | NodesLevelmax
---+------------+-----+--------------+

--------------------+-----+----------------------+
 graphOffsetsLevel0 | ... | graphOffsetsLevelmax |
--------------------+---- +----------------------+
{code}
 *  LevelCount: number of levels
 * SizeLevel0, ..., SizeLevelmax: number of nodes of each level
 * NodesLevel1, ..., NodesLevelmax:  a list of the ordinals in level 0 that are 
contained in each level . It is not necessary to store nodes on level 0 as this 
level contains all nodes.  
 * graphOffsetsLevelmax, ..., graphOffsetsLevel0: graph offsets for 
corresponding levels from 0 to max

 

 


was (Author: mayya):
Proposed .vem index file structure:

 
{code:java}
+-------------+--------+----------+----------+----------+---------+-------
| FieldNumber | SimFun | VDOffset | VDLength | VIOffset | VILength| dims  
+-------------+--------+----------+----------+----------+---------+-------

+-------------+-----------+-----+-------------+--------+
| LevelsCount | SizeLevel0| ... | SizeLevelmax| docIds 
+-------------+-----------+-----+-------------+--------+

---+------------+-----+--------------+
ep | NodesLevel1| ... | NodesLevelmax
---+------------+-----+--------------+

--------------------+-----+----------------------+
 graphOffsetsLevel0 | ... | graphOffsetsLevelmax |
--------------------+---- +----------------------+
{code}
 LevelCount - number of levels

SizeLevel0, ..., SizeLevelmax - number of nodes of each level

ep - entry point of the graph on the top level as a node ordinal 

NodesLevel1, ..., NodesLevelmax - list of nodes on each level from 1 to max; it 
not necessary to store nodes on level 0 as this level contains all nodes.

graphOffsetsLevelmax, ..., graphOffsetsLevel0 - graph offsets for corresponding 
levels from 0 to max

 

 

> Handle hierarchy in HNSW graph
> ------------------------------
>
>                 Key: LUCENE-10054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10054
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Mayya Sharipova
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to