[ 
https://issues.apache.org/jira/browse/SOLR-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888359#comment-17888359
 ] 

Houston Putman commented on SOLR-17102:
---------------------------------------

Hey [~dsmiley] I think the increased efficiency in the executor locking has 
uncovered other bad things we were doing. Basically we were creating URPs in a 
thread local:
{quote}{{ThreadLocal<UpdateRequestProcessor> procThreadLocal =}}
{{ThreadLocal.withInitial(}}
{{() -> {}}
{{var proc = processorChain.createProcessor(req, rsp);}}
{{procPool.add(proc);}}
{{return proc;}}
{{});}}{quote}
 
This means that each thread is going to use the same Req & Rsp for when 
creating the URPs. Since the locking is more efficient, there are now threads 
creating the URPs at the same time, and using the same Req & Rsp objects.
 
In DistributedUpdateProcessor:190, we call 
*DistributedUpdateProcessorFactory.addParamToDistributedRequestWhitelist* which 
will then update the Req object to add "PARAM_WHITELIST_CTX_KEY" to the Solr 
params. Since this request is shared across multiple threads that are creating 
URPs in parallel, this causes a ConcurrentModificationException.

I think this is a great change, but we need to find a way to either create new 
Req & Rsp objects for each thread, or make the lock a little more broad, so 
that the objects aren't used at the same time. I definitely prefer the first 
option. 

> VersionBucket not needed
> ------------------------
>
>                 Key: SOLR-17102
>                 URL: https://issues.apache.org/jira/browse/SOLR-17102
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 9.8
>
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> SolrCloud ensures that updates for the same document ID are done in the 
> correct order internally in the face of possible re-orders during replication 
> / log replay.  In order to ensure the updates are applied consecutively, a 
> lock is held on a hash of the ID for the doc.  A hash is used to limit the 
> number of total locks because the locks are pre-created in advance for the 
> core (numVersionBuckets == 65k by default).  The memory is non-negligible 
> with many cores, and it introduces the possibility of collisions, especially 
> at lower bucket counts if you configure it much lower.
> Here I propose doing away with a pre-created hashed bucket strategy.  
> Instead, I propose more simply creating and GC'ing a lock per update being 
> processed, and using a ConcurrentHashMap to hold those in-flight.  This 
> strategy is already used in 
> org.apache.solr.util.OrderedExecutor.SparseStripedLock, more or less.
> Doing this is more tractable now that VersionBucket only holds a lock, not a 
> version anymore – SOLR-17036
> The biggest challenge is that the code calls for the ability to use a 
> Condition to away/notify, which means the solution can't just re-use 
> SparseStripedLock above nor be quite so simple.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to