Mike, Thanks for the clue. It raises a lot of questions: - Is this cost caused by random access nature of the seekCeil() /** The target term may be before or after the current term. */? Is there any chance to make it more efficient by requesting "forward only" TermEnum? - will it be faster with 'entirely memory residend term dictionary'? - or the overall idea of using TermEnum just complies with the sub, and it's worth to experiment with writing the previous parent docnum (or current block size) in payload and reading it when we need to jump back on advance()? - once again, would you mind to remind why making DocEnum capable to jump back is so hard? Can you recommend any starting point for hacking?
Your answers are really appreciated. On Thu, Feb 13, 2014 at 3:10 PM, Michael McCandless < [email protected]> wrote: > Unfortunately, the terms dict is quite costly, so e.g. doing a > TermsEnum.seekCeil inside a DocsEnum.advance will probably really hurt > performance? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Feb 12, 2014 at 4:12 PM, Mikhail Khludnev > <[email protected]> wrote: > > Hello, > > > > Some time ago Uwe defined the problem of making block-join more cute. > > > https://issues.apache.org/jira/browse/LUCENE-5092?focusedCommentId=13736713&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13736713 > > I'm not sure I got him right, but recently I thought (what to talk about > in > > Washington) about comprehensive relations modeling cases. Anyway, I > started > > from simple test for alternative block join implementation. The overall > idea > > is: > > - to keep blocks as-is, they are cute; > > - to use term enum for looping parents while enumerating children on > > nextDoc(), hence these terms should be equal to docnums; > > - to use a single element doclist to jump back to the previous parent > for > > advance(). > > > > Now you can see that I just tried to reuse trendy Lucene data-structures > to > > get rid of rewindable bit-set. Right now, the code is ugly because I > reusing > > them by plain document indexing, later it can be done better with a > > specialized codec/enum api. It makes no sense as just a block join > > replacement, but it might work out as general modelling approach. > > > > Here is the code > > > https://github.com/m-khl/solr-patches/blob/af089475ec122630e231dbba397d5639013668e5/lucene/join/src/test/org/apache/lucene/search/join/TestBlockRelations.java?source=cc#L131 > > > > Here it the scratches which might explain the current implementation > > http://goo.gl/yS1VZN > > > > Your feedback is appreciated. > > -- > > Sincerely yours > > Mikhail Khludnev > > Principal Engineer, > > Grid Dynamics > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <[email protected]>
