[ 
https://issues.apache.org/jira/browse/HADOOP-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618038#action_12618038
 ] 

Konstantin Shvachko commented on HADOOP-3860:
---------------------------------------------

I benchmarked three operations: _create_, _rename_, and _delete_ using 
{{NNThroughputBenchmark}}, which is a pure name-node benchmark. It calls the 
name-node methods directly without using the rpc protocol. So the *rpc overhead 
is not included* in these results, and should be measured separately say with 
synthetic load generator. 
In a sense these benchmarks determine an upper bound for the HDFS operations, 
namely the maximum throughput the name-node can sustain under heavy load.

Each run starts with an empty files system and performs 1 million operations 
handled by 256 threads on the name-node. The output is the throughput that is 
the number of operation per second, which is calculated as 1,000,000/(tE-tE), 
where tB is when the first thread starts, and tE is when all threads stop. The 
threads run in parallel.
Creates create empty files and do not close them. Renames change file names, 
but do not move them.
All test results are consistent except for one distortion in deletes on a 
remote drive, which is way out of the expected range. Don't know what that is, 
one day they were good the other not.

Each test consists of 1,000,000 operations performed using 256 threads.
Result is in *ops/sec*.
||Log to        ||open  ||create (no close)     ||rename        ||delete||
|none           | 126,119| | | |
|1 Local HD     | |5,710        |8,400  |20,690|
|1 NFS HD       | |5,600        |8,290  |12,090|
|1 NFS Filer    | |5,676        |8,134  |21,100|
|4 Local HD     | |5,210| | |
|3 loc HD, 1 NFS HD     | |5,150| | |

Some conclusions:
-       Local drive is faster than nfs, and
-       nfs filer is faster than a remote drive;
-       but *the difference between nfs storage and local drives is very slim, 
only 2-3%*.
-       *Using 4 local drives instead of 1 degrades the performance by only 
9%*, even though we write onto the drives sequentially (one after another).
_It would be fair to say that there is some parallelism in writing, since 
current code batches writes first and then synchs them at once in larges 
chunks. So while the writes are sequential the synchs are parallel._
-       Opens (getBlockLocation()) are 22 times faster than creates,
-       which means *journaling is the real bottleneck* for the name-node 
operations,
-       and the *lack of fine-grained locking in the namespace data-structures 
is not a problem* so far. Otherwise, the throughputs for opens and other 
operations would be characterized by the same or at least close numbers.
-       Further optimization of the name-node performance imo should be focused 
around *efficient journaling*.

Another set of statistical data, which characterizes the actual load on the 
name-node on some of our clusters. Unfortunately, the statistics for open is 
broken, and we do not collect stats for renames. So I can only present creates 
and deletes. Please contribute if somebody has more data.

||Actual load (ops/sec)||open   ||create        ||delete||
|peak   | |144  |6460|
|avarage| |11   |50|

-       These numbers show that the actual peak load for creates is about 40 
times lower than the name-node can handle, and 3 times lower for deletes. On 
average the picture is even more drastic. 
*The name-node processing capability is 400-500 times higher than the actual 
average load on it.*


> Compare name-node performance when journaling is performed into local 
> hard-drives or nfs.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3860
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3860
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.19.0
>
>         Attachments: NNThruputMoreOps.patch
>
>
> The goal of this issue is to measure how the name-node performance depends on 
> where the edits log is written to.
> Three types of the journal storage should be evaluated:
> # local hard drive;
> # remote drive mounted via nfs;
> # nfs filer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to