[
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000413#comment-13000413
]
Paul Elschot commented on LUCENE-2454:
--------------------------------------
bq. I think your proposal here is related to a new (to me) use case where
clients can add a single new "child" document and the index automagically
reorganises to assemble all prior related documents back into a structure where
they are grouped as contiguous documents held in the same segment?
Indeed.
The first two fields match the ones I intended.
The third field for the document type would be quite useful for searching, but
it may not be necessary to maintain the document order.
The intention is quite simple: allow a set of documents to be used to provide a
single score value during query searching. AFAICT that fits most of the use
cases described here.
To allow conjunctions inside such a set, it is necessary to advance() a scorer
into a set, and for that it might be better to put the set representative
before the children. The document order would then be pre-order instead of
post-order, which would not really make any difference in difficulty to keep
the docs in order.
With the representative before the children, an extra operation (sth like
previousDocId()) would be needed on the iterator of the filter.
I don't know about flushes during merging. One operation that would recur
during index maintenance is appending a sequence of documents from one segment
to another segment, see docs 1, 2 and 3 above.
This is indeed what needs to be done when a new child is added, or when an
existing one is changed, i.e. deleted and added.
I'm not familiar with the merging code, but I would suppose something very
close to appending a sequence of documents from an existing segment is already
available. Anyway this is costly, but that is the price to pay.
During searching, the term filters used for the node representatives might use
some optimizations. Since one term filter is needed for every document scorer
involved in searching the query and these term filters are all based on the
same term, they could share index information, for example in a filter cache.
A bit set is not always optimal for such filters, perhaps a more tree like
structure could be more compact and faster. But bit sets could be used to get
this going.
The good news so far for me is that this seems to be feasible, thanks.
> Nested Document query support
> -----------------------------
>
> Key: LUCENE-2454
> URL: https://issues.apache.org/jira/browse/LUCENE-2454
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Affects Versions: 3.0.2
> Reporter: Mark Harwood
> Assignee: Mark Harwood
> Priority: Minor
> Attachments: LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in
> http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]