[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297483#comment-16297483
 ] 

Erik Krogen commented on HDFS-12943:
------------------------------------

We have been running some performance experiments (using 
[Dynamometer|https://lists.apache.org/thread.html/7223d22fbc26e055369695f83395e9a7767043f7245af25df385b535@%3Chdfs-dev.hadoop.apache.org%3E])
 to try to determine just how large the potential benefits to be gained by this 
feature are. Using the tool, we replayed a few hours of traces from a 
production cluster against a simulated NameNode, filtering out different % of 
read requests to mimic the ANN's point-of-view of requests going to the 
standby. We tried filtering out 0%, 20%, 50%, and 100% of read requests, and 
also tried replaying our write workload only at 2x and 4x speed to get an 
estimate of throughput under the ideal (all reads offloaded) conditions.

|| ||0% Skip||20% Skip||50% Skip||100% Skip||100% Skip (2x)||100% Skip (4x)||
|Average Write Latency (ms)|52.8|28.5|18.0|14.0|27.0|73.2|
|Average Read Latency (ms)|34.3|20.0|11.5|N/A|N/A|N/A|
|RPC Queue AvgTime (ms)|23.0|11.9|7.4|1.7|4.3|20.7|
|RPC Queue 50th Percentile (ms)|2.81|0.52|0.47|0.05|0.05|0.04|
|RPC Queue 90th Percentile (ms)|24.42|12.51|9.98|0.12|1.49|12.96|
|RPC Queue NumOps (k)|31.0|25.2|16.3|1.5|3.0|6.0|
|LockQueueLength Average|45.3|24.9|18.9|7.0|12.5|30.6|
|GC Time (ms)|9.62|7.94|6.13|1.94|3.03|5.49|

The results above indicate that, if we were able to offload all read requests, 
we should expect up to 4x throughput improvement for the write workload.

> Consistent Reads from Standby Node
> ----------------------------------
>
>                 Key: HDFS-12943
>                 URL: https://issues.apache.org/jira/browse/HDFS-12943
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs
>            Reporter: Konstantin Shvachko
>         Attachments: ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to