[jira] Commented: (LUCENE-2454) Nested Document query support

Paul Elschot (JIRA) Mon, 28 Feb 2011 09:23:06 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000413#comment-13000413
 ]


Paul Elschot commented on LUCENE-2454:
--------------------------------------

bq. I think your proposal here is related to a new (to me) use case where 
clients can add a single new "child" document and the index automagically 
reorganises to assemble all prior related documents back into a structure where 
they are grouped as contiguous documents held in the same segment?

Indeed.

The first two fields match the ones I intended.
The third field for the document type would be quite useful for searching, but 
it may not be necessary to maintain the document order.

The intention is quite simple: allow a set of documents to be used to provide a 
single score value during query searching. AFAICT that fits most of the use 
cases described here.

To allow conjunctions inside such a set, it is necessary to advance() a scorer 
into a set, and for that it might be better to put the set representative 
before the children. The document order would then be pre-order instead of 
post-order, which would not really make any difference in difficulty to keep 
the docs in order.
With the representative before the children, an extra operation (sth like 
previousDocId()) would be needed on the iterator of the filter.

I don't know about flushes during merging. One operation that would recur 
during index maintenance is appending a sequence of documents from one segment 
to another segment, see docs 1, 2 and 3 above.
This is indeed what needs to be done when a new child is added, or when an 
existing one is changed, i.e. deleted and added.
I'm not familiar with the merging code, but I would suppose something very 
close to appending a sequence of documents from an existing segment is already 
available. Anyway this is costly, but that is the price to pay.

During searching, the term filters used for the node representatives might use 
some optimizations. Since one term filter is needed for every document scorer 
involved in searching the query and these term filters are all based on the 
same term, they could share index information, for example in a filter cache.
A bit set is not always optimal for such filters, perhaps a more tree like 
structure could be more compact and faster. But bit sets could be used to get 
this going.

The good news so far for me is that this seems to be feasible, thanks.



> Nested Document query support
> -----------------------------
>
>                 Key: LUCENE-2454
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2454
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments: LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in 
> http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2454) Nested Document query support

Reply via email to