Based on suggestion here implemented a script to un-invert the index
(details at OAK-7122 [1], [2]).
uninverting was done by following logic
def collectFieldNames(DirectoryReader reader) {
println "Proceeding to collect the field names per document"
Bits liveDocs = MultiFields.getLiveDocs(reader)
Fields fields = MultiFields.getFields(reader)
fields.each {String fieldName ->
Terms terms = fields.terms(fieldName)
TermsEnum termsEnum = terms.iterator(null)
while (termsEnum.next() != null) {
DocsEnum docsEnum = termsEnum.docs(liveDocs, null,
DocsEnum.FLAG_NONE)
while(docsEnum.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
int docId = docsEnum.docID()
DocInfo di = infos.get(docId)
assert di : "No DocInfo for docId : $docId"
di.fieldIds << getFieldId(fieldName)
}
}
}
}
Thanks for the all the help!
Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/OAK-7122
[2]
https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/lucene
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]