How about the quickest solution: dump the content of both indexes to a
document-per-line text
file, sort, diff?

Even if your indexes are large, if you have large spare disk, this
will be super fast.


On Tue, Jan 2, 2018 at 7:33 AM, Chetan Mehrotra
<> wrote:
> Hi,
> We use Lucene for indexing in Jackrabbit Oak [2]. Recently we
> implemented a new indexing approach [1] which traverses the data to be
> indexed in a different way compared to the traversal approach we have
> been using so far. The new approach is faster and produces index with
> same number of documents.
> Some notes around index
> ------------------------------------
> - The lucene index only has one stored field for ':path' of node in 
> repository.
> - Content being indexed is unstructured so presence of fields may differ
> - Lucene version 4.7.x
> - Both approach would index a given node in same way. Its just the
> traversal order which differ
> Now we need to compare the index which is produced by earlier approach
> with newer one to determine if the generated index is "same". As
> indexed data is traversed in different order the documentId would
> differ between two indexes and hence the final size differs to some
> extent.
> So I would like to implement a logic which can logically compare 2
> indexes. One way could be to find if a document with given path in 2
> indexes has same fieldNames associated with it. However as fields are
> not stored its not possible to determine the fieldNames per document.
> Questions
> --------------
> 1. Any way to map field names (not the values) associated with a given 
> document
> 2. Any other way to logically compare the index data between 2 indexes
> which are generated using different approach but index same content.
> Chetan Mehrotra
> [1]
> [2]
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to