[
https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikhail Khludnev updated SOLR-2947:
-----------------------------------
Attachment: SOLR-2947.patch
Ok. here is the patch, which fixes issue with destroy() and problem with
multiple threads and CachedSqlEntityProcessor.
h3.Code
h4.Context.java, ContextImpl.java
* removed SCOPE_DOC constant. I can't find any usages. Old impl isn't thread
safe. We can implement it thread safe if you want. Let me know if it's
necessary.
* Pay attention that ContextImpl.putVal() *ignores the scope provided*. It
should be tracked separately let me know if you like me to raise it.
h4.DataImporter.java
I added DocBuilder.destroy() to stop thread pool after all work is done. I'm
bothered by testCase's warns about "thread leaks"
h4.DIHCacheSupport.java
it just introduces a getter. But I generated diff against uncommitted
SOLR-2961, so line numbers can be wrong, let me know I re-diff it.
h4.DocBuilder.java
* EntityRunner stops create EntityProcessors and obtains it from constructor
args
* proper destroying EntityProcessors
* EntityRunner.docWrapper is removed as not-thread-safe. it's passed explicitly
by method arguments
* EntityRunner.entityEnded was't thread-safe too. moved into
ThreadedEntityProcessorWrapper
* object instantiating was drastically amended to be threadsafe
** single EntityRunner per Entity
** single EntityProcessor per EntityRunner
** N ThreadedEntityProcessorWrapper per EntityRunner uses its' EntityProcessor
as delegate
** where N is number of threads specified at root entity (threads attr is
prohibited for child entities)
** ThreadedEntityProcessorWrapper are numbered by their positions in
EntityRunner's tepw list
** parent entity's ThreadedEntityProcessorWrapper always hits children's tepw
with the same number as its' own
* parent entity's ThreadedEntityProcessorWrapper always hits children's tepw by
plain Java synchronous call (w/o thread pool)
h4.EntityProcessor.java,EntityProcessorBase.java
isPaged() property has been introduced
h4.EntityProcessorWrapper.java
protected transformRow() has been extracted from applyTransformer(). I need to
reuse transformers logic for the paged flow but applyTransformer() has
side-effect on rowcache field.
h4.ThreadedEntityProcessorWrapper.java
in addition to all refactorings above (instantiating and field move). it
contains the core idea of multithred cached entity processor:
* after tepw obtains access to thread-unaware delegate entityProcessor it need
to pull whole page - all children records belong to the current parent,
* whole page is transformed and put into tepw.rowcahce, where they will be
pulled later by the parent entity tepw
h3.Tests
h4.TestThreaded.java
added full space test for CachedSqlEP for no, 1, 2, 10 (keep in mind 1 thread
don't equal to no-threads)
h4.TestEphemeralCache.java
add double destroy() check EntityProcessors
h4.dataimport-cache-ephemeral.xml
specifies 10 threads and add double destroy() EntityProcessors
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
> Key: SOLR-2947
> URL: https://issues.apache.org/jira/browse/SOLR-2947
> Project: Solr
> Issue Type: Sub-task
> Components: contrib - DataImportHandler
> Affects Versions: 4.0
> Reporter: Mikhail Khludnev
> Labels: noob
> Fix For: 4.0
>
> Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch,
> dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage.
> If I enable DocBuilder.EntityRunner flow even for single thread, it breaks
> the pretty basic functionality: parent-child join.
> the reason is [line 473
> entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup]
> breaks children entityProcessor.
> see attachement comments for more details.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]