[jira] Commented: (LUCENE-2454) Nested Document query support

Mark Harwood (JIRA) Mon, 14 Jun 2010 08:37:43 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878617#action_12878617
 ]


Mark Harwood commented on LUCENE-2454:
--------------------------------------

Yep, I can see an app with a thousand cached filters would have a problem with 
this impl as it stands. 

Maintaining parallel indexes always feels a little flaky to me, not least 
because of the loss of  transactional integrity you can get from using a single 
index.

Is another approach to make your cached filters document-type-specific?   I.e. 
they only hold numbers in the range of zero to number-of-docs-of-this-type.
To use a cached doc ID in such a filter you would need to make use of mapping 
arrays to project the type-specific doc id numbers into global doc-id 
references and back.
Lets imagine an index with a mix of  "A", "B" and "C" doc types organised as 
follows:
docId    docType
=====  =======
1            A
2            B
3            C
4            A
5            C
6            C

The mapping arrays for docType "C" would look as follows
{code:title=Bar.java|borderStyle=solid}
int [ ] globalDocIdToTypeCLookUp = {-1,-1,0,-1,1,2}        // sparse, sized 0-> 
num docs in overall index
int [ ] typeCToGlobalDocIdLookUp = {0,1,2}          // dense, sized 0-> num 
type C docs in overall index
{code}

Your cached filters would be created as follows:
{code:title=Bar.java|borderStyle=solid}
myTypeCBitset=new OpenBitSet(numberOfTypeCDocs);  //this line is hopefully 
where you save RAM!
//for all matching type C docs...
myTypeCBitSet.setBit(globalDocIdToTypeCLookUp[realDocId];
{code}

Your filters can then be used by dereferencing the child doc IDs as follows:
{code:title=Bar.java|borderStyle=solid}
int nextRealDocId=typeCToGlobalDocIdLookUp [myTypeCBitSet.getNextSetBit()];
{code}
  
Clearly the mapping arrays come at a cost of 4bytes*num docs which is non 
trivial. The sparse globalDocIdToTypeCLookUp array shown here could be avoided 
by reading TermDocs and counting at cached-Filter-create time .


> Nested Document query support
> -----------------------------
>
>                 Key: LUCENE-2454
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2454
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments: LuceneNestedDocumentSupport-1.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in 
> http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2454) Nested Document query support

Reply via email to