[ https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055096#comment-13055096 ]
Michael McCandless commented on LUCENE-3171: -------------------------------------------- bq. The possible inefficiency is the same as the one for a any sparsely filled OpenBitSet. Ahh, OK. Though, I suspect this (the linear scan OBS does for next/prevSetBit) is a minor cost overall, if indeed the app has so many child docs per parent that a sparse bit set would be warranted? Ie, the Query/Collector would still be visiting these many child docs per parent, I guess? (Unless the query hits few results). I don't think a jdoc warning is really required for this... but I'm fine if you want to add one? I'll commit this soon and resolve LUCENE-2454 as duplicate! > BlockJoinQuery/Collector > ------------------------ > > Key: LUCENE-3171 > URL: https://issues.apache.org/jira/browse/LUCENE-3171 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/other > Reporter: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3171.patch, LUCENE-3171.patch, LUCENE-3171.patch > > > I created a single-pass Query + Collector to implement nested docs. > The approach is similar to LUCENE-2454, in that the app must index > documents in "join order", as a block (IW.add/updateDocuments), with > the parent doc at the end of the block, except that this impl is one > pass. > Once you join at indexing time, you can take any query that matches > child docs and join it up to the parent docID space, using > BlockJoinQuery. You then use BlockJoinCollector, which sorts parent > docs by provided Sort, to gather results, grouped by parent; this > collector finds any BlockJoinQuerys (using Scorer.visitScorers) and > retains the child docs corresponding to each collected parent doc. > After searching is done, you retrieve the TopGroups from a provided > BlockJoinQuery. > Like LUCENE-2454, this is less general than the arbitrary joins in > Solr (SOLR-2272) or parent/child from ElasticSearch > (https://github.com/elasticsearch/elasticsearch/issues/553), since you > must do the join at indexing time as a doc block, but it should be > able to handle nested joins as well as joins to multiple tables, > though I don't yet have test cases for these. > I put this in a new Join module (modules/join); I think as we > refactor join impls we should put them here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org