How about the quickest solution: dump the content of both indexes to a document-per-line text file, sort, diff?
Even if your indexes are large, if you have large spare disk, this will be super fast. Dawid On Tue, Jan 2, 2018 at 7:33 AM, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > Hi, > > We use Lucene for indexing in Jackrabbit Oak [2]. Recently we > implemented a new indexing approach [1] which traverses the data to be > indexed in a different way compared to the traversal approach we have > been using so far. The new approach is faster and produces index with > same number of documents. > > Some notes around index > ------------------------------------ > > - The lucene index only has one stored field for ':path' of node in > repository. > - Content being indexed is unstructured so presence of fields may differ > - Lucene version 4.7.x > - Both approach would index a given node in same way. Its just the > traversal order which differ > > Now we need to compare the index which is produced by earlier approach > with newer one to determine if the generated index is "same". As > indexed data is traversed in different order the documentId would > differ between two indexes and hence the final size differs to some > extent. > > So I would like to implement a logic which can logically compare 2 > indexes. One way could be to find if a document with given path in 2 > indexes has same fieldNames associated with it. However as fields are > not stored its not possible to determine the fieldNames per document. > > Questions > -------------- > > 1. Any way to map field names (not the values) associated with a given > document > 2. Any other way to logically compare the index data between 2 indexes > which are generated using different approach but index same content. > > Chetan Mehrotra > [1] https://issues.apache.org/jira/browse/OAK-6353 > [2] http://jackrabbit.apache.org/oak/docs/query/lucene.html > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org