[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530222#comment-13530222
 ] 

Jay Shrauner commented on ZOOKEEPER-1505:
-----------------------------------------

Alex- The race condition is within FinalRequestProcessor on any node--Leader, 
Follower, or Observer. This has nothing to do with serialization order of the 
leader. Watch setting/firing is a write operation only on the local node of 
locally maintained state. What happens is, say client A is toggling the value 
of node /X from 1 to 2, and client B is reading and setting a watch on node /X. 
Client B will always see a consistent view; it may however not receive a watch 
firing so it may never know to read value 2. If client B is relying on timely 
watch firing to keep its data fresh, this is a problem.

1. It is possible for thread C1 to process client B reading value 1 and setting 
the watch; thread C2 to process client A writing 2 to /X, firing the watch, 
writing this out to client B's network stack (the watch firing); and finally 
thread C1 to push the read of value 1 onto client B's network stack. Because 
the return value of a getData-and-setWatch call came after the watch fired, the 
client will possibly ignore the watch firing. So eg say client B had originally 
responded to a watch firing on /X. In its view, it sees /X watch fire, it sends 
a getData request, it sees /X watch fire again (which it ignores, because it 
already has a getData outstanding), and finally it gets the response to its 
getData request.

2. It is also possible for client B to read value 1, client A to write value 2 
and check for watch firing, and then for client B to reset the watch. There is 
no locking guarding the atomicity of client B reading /X and setting the watch 
on /X.

It is relatively straightforward to add locking preventing case (2), but for 
case (1) I think we need to restrict parallelism in FinalRequestProcessor.

We can improve the parallelism here, but it hit the point where I wanted to 
leave that for a future Jira. If we could identify which read requests set 
watches, and treat those as a third type, we could then allow pure read 
requests from client B to process simultaneously with write request from client 
A. Current code only fully parses getData and other read request blocks in 
FinalRequestProcessor, so we would need to move this up earlier, which might 
however have performance implications.

                
> Multi-thread CommitProcessor
> ----------------------------
>
>                 Key: ZOOKEEPER-1505
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.4.3, 3.4.4, 3.5.0
>            Reporter: Jay Shrauner
>            Assignee: Jay Shrauner
>              Labels: performance, scaling
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1505.patch, ZOOKEEPER-1505.patch, 
> ZOOKEEPER-1505.patch, ZOOKEEPER-1505.patch
>
>
> CommitProcessor has a single thread that both pulls requests off its queues 
> and runs all downstream processors. This is noticeably inefficient for 
> read-intensive workloads, which could be run concurrently. The trick is 
> handling write transactions. I propose multi-threading this code according to 
> the following two constraints
>   - each session must see its requests responded to in order
>   - all committed transactions must be handled in zxid order, across all 
> sessions
> I believe these cover the only constraints we need to honor. In particular, I 
> believe we can relax the following:
>   - it does not matter if the read request in one session happens before or 
> after the write request in another session
> With these constraints, I propose the following threads
>   - 1    primary queue servicing/work dispatching thread
>   - 0-N  assignable worker threads, where a given session is always assigned 
> to the same worker thread
> By assigning sessions always to the same worker thread (using a simple 
> sessionId mod number of worker threads), we guarantee the first constraint-- 
> requests we push onto the thread queue are processed in order. The way we 
> guarantee the second constraint is we only allow a single commit transaction 
> to be in flight at a time--the queue servicing thread blocks while a commit 
> transaction is in flight, and when the transaction completes it clears the 
> flag.
> On a 32 core machine running Linux 2.6.38, achieved best performance with 32 
> worker threads for a 56% +/- 5% improvement in throughput (this improvement 
> was measured on top of that for ZOOKEEPER-1504, not in isolation).
> New classes introduced in this patch are:
>     WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that 
> makes worker threads daemon threads and names then in an easily debuggable 
> manner. Supports assignable threads (as used here) and non-assignable threads 
> (as used by NIOServerCnxnFactory).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to