[ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186000#comment-13186000 ]
Hoss Man commented on SOLR-2947: -------------------------------- bq. I decided that my 32K patch from 28/Dec is to huge. I'm attaching the shorter version. Mikhail: as someone who wants to help get DIH improvements/bug fixes commited, but probably understands less about DIH internals then you and James, i thank you for splitting out these issues and explaining what's going on, and pressing forward with patches that include tests. The latest "SOLR-2947.patch" looks fairly good to me, and all tests pass (with the caveat that the @Ignored test depends on SOLR-3011 - although that should really be noted in the @Ignore msg) the one thing that really confuses me is the implementation of children(), it smells extremely fishy... {noformat} + Collection<EntityRunner> children(){ + return entityProcessorWrapper.iterator().next().children.values(); + } {noformat} ...why is it sufficient to get the children from only the first element of that iterator? are there any situations where the iterator may not return any elements? I'm guessing the answer to the later is "no" because that would mean 0 threads, correct? (in which case, let's toss a java assert in there for good measure). As for the first question: at first i thought this was because all of the ThreadedEntityProcessorWrapper instances were initialized with identical EntityProcessors so any one would due (in which why not just use the entityProcessor handing directly off the parent EntityRunning) -- but poking around ThreadedEntityProcessorWrapper i see that they seem to always create new EntityRunner instance, so i'm at a loss to understand how that "children()" method makes sense. Is this something that's just a hack that only works with 1 thread until/unless SOLR-3011 is resolved? will this break things for people using multithreaded DIH _without_ nested entities? (i don't think so -- because that would contradict the premise of there being any child entities that need their entityProcessor to be desctoryed -- but i wanted to sanity check the question). Either way: a comment clarifying that monstrosity would be a good idea. James: are you comfortable with the latest patch? > DIH caching bug - EntityRunner destroys child entity processor > -------------------------------------------------------------- > > Key: SOLR-2947 > URL: https://issues.apache.org/jira/browse/SOLR-2947 > Project: Solr > Issue Type: Sub-task > Components: contrib - DataImportHandler > Affects Versions: 4.0 > Reporter: Mikhail Khludnev > Labels: noob > Fix For: 4.0 > > Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, > SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, > dih-cache-threads-enabling-bug.patch > > > My intention is fix multithread import with SQL cache. Here is the 2nd stage. > If I enable DocBuilder.EntityRunner flow even for single thread, it breaks > the pretty basic functionality: parent-child join. > the reason is [line 473 > entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] > breaks children entityProcessor. > see attachement comments for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org