[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878617#action_12878617 ]
Mark Harwood commented on LUCENE-2454: -------------------------------------- Yep, I can see an app with a thousand cached filters would have a problem with this impl as it stands. Maintaining parallel indexes always feels a little flaky to me, not least because of the loss of transactional integrity you can get from using a single index. Is another approach to make your cached filters document-type-specific? I.e. they only hold numbers in the range of zero to number-of-docs-of-this-type. To use a cached doc ID in such a filter you would need to make use of mapping arrays to project the type-specific doc id numbers into global doc-id references and back. Lets imagine an index with a mix of "A", "B" and "C" doc types organised as follows: docId docType ===== ======= 1 A 2 B 3 C 4 A 5 C 6 C The mapping arrays for docType "C" would look as follows {code:title=Bar.java|borderStyle=solid} int [ ] globalDocIdToTypeCLookUp = {-1,-1,0,-1,1,2} // sparse, sized 0-> num docs in overall index int [ ] typeCToGlobalDocIdLookUp = {0,1,2} // dense, sized 0-> num type C docs in overall index {code} Your cached filters would be created as follows: {code:title=Bar.java|borderStyle=solid} myTypeCBitset=new OpenBitSet(numberOfTypeCDocs); //this line is hopefully where you save RAM! //for all matching type C docs... myTypeCBitSet.setBit(globalDocIdToTypeCLookUp[realDocId]; {code} Your filters can then be used by dereferencing the child doc IDs as follows: {code:title=Bar.java|borderStyle=solid} int nextRealDocId=typeCToGlobalDocIdLookUp [myTypeCBitSet.getNextSetBit()]; {code} Clearly the mapping arrays come at a cost of 4bytes*num docs which is non trivial. The sparse globalDocIdToTypeCLookUp array shown here could be avoided by reading TermDocs and counting at cached-Filter-create time . > Nested Document query support > ----------------------------- > > Key: LUCENE-2454 > URL: https://issues.apache.org/jira/browse/LUCENE-2454 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Affects Versions: 3.0.2 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Attachments: LuceneNestedDocumentSupport-1.zip > > > A facility for querying nested documents in a Lucene index as outlined in > http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org