[ 
https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186000#comment-13186000
 ] 

Hoss Man commented on SOLR-2947:
--------------------------------

bq. I decided that my 32K patch from 28/Dec is to huge. I'm attaching the 
shorter version.

Mikhail: as someone who wants to help get DIH improvements/bug fixes commited, 
but probably understands less about DIH internals then you and James, i thank 
you for splitting out these issues and explaining what's going on, and pressing 
forward with patches that include tests.

The latest "SOLR-2947.patch" looks fairly good to me, and all tests pass (with 
the caveat that the @Ignored test depends on SOLR-3011 - although that should 
really be noted in the @Ignore msg) the one thing that really confuses me is 
the implementation of children(), it smells extremely fishy...

{noformat}
+    Collection<EntityRunner> children(){
+        return entityProcessorWrapper.iterator().next().children.values();
+    } 
{noformat}

...why is it sufficient to get the children from only the first element of that 
iterator?  are there any situations where the iterator may not return any 
elements?

I'm guessing the answer to the later is "no" because that would mean 0 threads, 
correct? (in which case, let's toss a java assert in there for good measure).

As for the first question: at first i thought this was because all of the  
ThreadedEntityProcessorWrapper instances were initialized with identical 
EntityProcessors so any one would due (in which why not just use the 
entityProcessor handing directly off the parent EntityRunning) -- but poking 
around ThreadedEntityProcessorWrapper i see that they seem to always create new 
EntityRunner instance, so i'm at a loss to understand how that "children()" 
method makes sense.

Is this something that's just a hack that only works with 1 thread until/unless 
SOLR-3011 is resolved?  will this break things for people using multithreaded 
DIH _without_ nested entities? (i don't think so -- because that would 
contradict the premise of there being any child entities that need their 
entityProcessor to be desctoryed -- but i wanted to sanity check the question). 
 Either way: a comment clarifying that monstrosity would be a good idea. 

James: are you comfortable with the latest patch?
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, 
> SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, 
> dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. 
> If I enable DocBuilder.EntityRunner flow even for single thread, it breaks 
> the pretty basic functionality: parent-child join.
> the reason is [line 473 
> entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup]
>  breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to