[jira] [Commented] (LUCENE-2454) Nested Document query support

Michael McCandless (JIRA) Thu, 26 May 2011 03:18:37 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039623#comment-13039623
 ]


Michael McCandless commented on LUCENE-2454:
--------------------------------------------

bq. I'll need to check LUCENE-3129 for equivalence with PerParentLimitQuery. 
It's certainly a central part of what I typically deploy for nested queries - 
pass 1 is usually a NestedDocumentQuery to get the best parents and pass 2 uses 
PerParentLimitQuery to get the best children for these best parents.

Hmm, so I wonder if we could do this in one pass?  Ie, like grouping,
if you indexed your docs as blocks, you can use the faster single-pass
collector; but if you didn't, you can use the more general but slower
and more-RAM-consuming two pass collector.

It seems like we should be able to do something similar with joins,
somehow... ie Solr's join impl is a start at the "fully general"
two-pass solution.

But I agree the "join child to parent" and then "grouping of child
docs" go hand in hand for searching...

What do you do for facet counting in these apps...?  Post-grouping
faceting also ties in here.

bq. Of course some apps can simply fetch ALL children for the top parents but 
in some cases summarising children is required

Right...

bq.  (note: this is potentially a great solution for performance issues on 
highlighting big docs e.g. entire books).

I think it'd be compelling to index book/articles with each
page/section/chapter being a new doc, and then group them under their
book/article.

bq. I haven't benchmarked nextSetBit vs the existing "rewind" implementation 
but I imagine it may be quicker.

I think it should be much faster -- obs.nextSetBit looks heavily
optimized, since it can operate a word at a time.  Though, if the
groups are smallish, so that nextSetBit is often maybe 2 or 3 bits
away, I'm not sure it'd be faster...

bq. Parent- followed-by-children seems more natural from a user's point of view 
however.

But is it really so bad to ask the app to put parent doc last?

I mean, the docs have to be indexed w/ the new doc block APIs in IW
anyway, which will often be eg a List<Document>, at which point
putting parent last seems a minor imposition?

Since this is an expert API I think it's OK to put [minor] impositions
on its usage if this can simplify the impl / make it faster / less
risky.  That said, I'm not yet sure on the impl (single pass query +
collector vs generic two-pass join that solr now has), so it's
probably premature to worry about this...

bq. I guess you could always keep the parent-then-child insertion order but 
flip the bitset (then cache) for query execution if that was faster.

True but this adds some hair into the impl (we must also "flip" coming
back from nextSetBit)...

bq. Benchmarking rewind vs nextSetbit vs flip then nextSetBit would reveal all.

True, though it'd be best to do this in the context of the actual join impl...


> Nested Document query support
> -----------------------------
>
>                 Key: LUCENE-2454
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2454
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 3.0.2
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in 
> http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-2454) Nested Document query support

Reply via email to