Comparing two indexes for equality - Finding non stored fieldNames per document

Chetan Mehrotra Mon, 01 Jan 2018 22:33:52 -0800

Hi,

We use Lucene for indexing in Jackrabbit Oak [2]. Recently we
implemented a new indexing approach [1] which traverses the data to be
indexed in a different way compared to the traversal approach we have
been using so far. The new approach is faster and produces index with
same number of documents.


Some notes around index
------------------------------------

- The lucene index only has one stored field for ':path' of node in repository.
- Content being indexed is unstructured so presence of fields may differ
- Lucene version 4.7.x
- Both approach would index a given node in same way. Its just the
traversal order which differ

Now we need to compare the index which is produced by earlier approach
with newer one to determine if the generated index is "same". As
indexed data is traversed in different order the documentId would
differ between two indexes and hence the final size differs to some
extent.

So I would like to implement a logic which can logically compare 2
indexes. One way could be to find if a document with given path in 2
indexes has same fieldNames associated with it. However as fields are
not stored its not possible to determine the fieldNames per document.

Questions
--------------

1. Any way to map field names (not the values) associated with a given document
2. Any other way to logically compare the index data between 2 indexes
which are generated using different approach but index same content.

Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/OAK-6353
[2] http://jackrabbit.apache.org/oak/docs/query/lucene.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Comparing two indexes for equality - Finding non stored fieldNames per document

Reply via email to