[
https://issues.apache.org/jira/browse/HADOOP-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188386#comment-17188386
]
Wang, Xinglong commented on HADOOP-17237:
-----------------------------------------
[~linyiqun]
> Improve NameNode RPC throughput with ReadWriteRpcCallQueue
> -----------------------------------------------------------
>
> Key: HADOOP-17237
> URL: https://issues.apache.org/jira/browse/HADOOP-17237
> Project: Hadoop Common
> Issue Type: Improvement
> Components: rpc-server
> Reporter: Wang, Xinglong
> Priority: Major
>
> *Current*
> In our production cluster, a typical traffic model is read to write raito is
> 10:1 and sometimes the ratios goes to 30:1.
> NameNode is using ReEntrantReadWriteLock under the hood of FSNamesystemLock.
> Read lock is shared lock while write lock is exclusive lock.
> Read RPC and Write RPC comes randomly to namenode. This makes read and write
> mixed up. And then only a small fraction of read can really share their read
> lock.
> Currently we have default callqueue and faircallqueue. And we can
> refreshCallQueue on the fly. This opens room to design new call queue.
> *Idea*
> If we reorder the rpc call in callqueue to group read rpc together and write
> rpc together, we will have sort of control to let a batch of read rpc come to
> handlers together and possibly share the same read lock. Thus we can reduce
> Fragments of read locks.
> This will only improve the chance to share the read lock among the batch of
> read rpc due to there are some namenode internal write lock is out of call
> queue.
> Under ReEntrantReadWriteLock, there is a queue to manage threads asking for
> locks. We can give an example.
> R: stands for read rpc
> W: stands for write rpc
> e.g
> RRRRWRRRRWRRRRWRRRRWRRRRWRRRRWRRRRWRRRRW
> In this case, we need 16 lock timeslice.
> optimized
> RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRWWWWWWWW
> In this case, we only need 9 lock timeslice.
> *Correctness*
> Since the execution order of any 2 concurrent or queued rpc in namenode is
> not guaranteed. We can reorder the rpc in callqueue into read group and write
> group. And then dequeue from these 2 queues by a designed strategy. let's say
> dequeue 100 read and then dequeue 5 write rpc and then dequeue read again and
> then write again.
> Since FairCallQueue also does rpc call reorder in callqueue, for this part I
> think they share the same logic to guarantee rpc result correctness.
> *Performance*
> In test environment, we can see a 15% - 20% NameNode RPC throughput
> improvement comparing with default callqueue.
> Test traffic is 30 read:3 write :1 list using NNLoadGeneratorMR
> This performance is not a surprise. Due to some write rpc is not managed in
> callqueue. We can't do reorder to them by reording calls in callqueue.
> But still we can do a fully read write reorder if we redesign
> ReEntrantReadWriteLock to achieve this. This will be further step after this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]