[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

Cao Manh Dat (JIRA) Mon, 28 May 2018 01:38:35 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492410#comment-16492410
 ]


Cao Manh Dat commented on SOLR-12338:
-------------------------------------

Thanks a lot for your review [~dsmiley], I was too busy recently.
{quote}
- I think the "hash" variable should not be called this to avoid confusion as 
there is no hashing. Maybe just "id" or "lockId"
- Do we still need the Random stuff?
- Maybe rename your "SetBlockingQueue" to "SetSemaphore" or probably better 
"SetLock" as it does not hold anything (Queues hold stuff)
- Can "Semaphore sizeLock" be renamed to "sizeSemaphore" or "sizePermits" is it 
does not extend Lock?
- Can the "closed" state be removed from SetBlockingQueue altogether? It's not 
clear it actually needs to be "closed". It seems wrong; other concurrent 
mechanisms don't have this notion (no Queue, Lock, or Semaphore does, etc.) 
FWIW I stripped this from the class and the test passed.
{quote}
+1

{quote}
Perhaps its better to acquire() the size permit first in add() instead of last 
to prevent lots of producing threads inserting keys into a map only to 
eventually wait. Although it might add annoying try-finally to add() to ensure 
we put the permit back if there's an exception after (e.g. interrupt). Heck; 
maybe that's an issue no matter what the sequence is.
{quote}
I don't think we should do that. {{sizeLock}} kinda like the number of maximum 
threads, if we reached that number, it seems better to let them wait before 
trying to enqueue more tasks.

{quote}
Can the value side of the ConcurrentHashMap be a Lock (I guess ReentrantLock 
impl)? It seems like the most direct concept we want; Semaphore is more than a 
Lock as it tracks permits that we don't need here?
{quote}
We can't. Lock or ReetrantLock only allows us to lock and unlock in the same 
thread. In the OrderedExecutor, we lock first then unlock in the thread of 
delegate executor.

{quote}
The hot while loop of map.putIfAbsent seems fishy to me. Even if it may be rare 
in practice, I wonder if we can do something simpler? You may get luck with 
map.compute* methods on ConcurrentHashMap which execute the lambda atomically. 
Though I don't know if it's bad to block if we try to acquire a lock within 
there. I see remove() removes the value of the Map but perhaps it the value 
were a mechanism that tracked that there's a producer pending, then we should 
not remove the value from the lock? If we did this, then maybe that would 
simplify add()? I'm not sure.
{quote}
I will think more about this.

> Replay buffering tlog in parallel
> ---------------------------------
>
>                 Key: SOLR-12338
>                 URL: https://issues.apache.org/jira/browse/SOLR-12338
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>         Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

Reply via email to