Pure name-node benchmarks.
--------------------------

                 Key: HADOOP-2149
                 URL: https://issues.apache.org/jira/browse/HADOOP-2149
             Project: Hadoop
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.16.0
            Reporter: Konstantin Shvachko
            Assignee: Konstantin Shvachko
             Fix For: 0.16.0


h3. Pure name-node benchmark.

This patch starts a series of name-node benchmarks.
The intention is to have a separate benchmark for every important name-node 
operation.
The purpose of benchmarks is
# to measure the throughput for each name-node operation, and
# to evaluate changes in the name-node performance (gain or degradation) when 
optimization
or new functionality patches are introduced.

The benchmarks measure name-node throughput (ops per second) and the average 
execution time.
The benchmark does not involve any other hadoop components except for the 
name-node.
The name-node server is real, other components are simulated.
There is no RPC overhead. Each operation is executed by calling directly the 
respective name-node method.
The benchmark is multi-threaded, that is one can start multiple threads 
competing for the
name-node resources by executing concurrently the same operation but with 
different data.
See javadoc for more details.

The patch contains implementation for two name-node operations: file creates 
and block reports.
Implementation of other operations will follow.

h3. File creation benchmark.

I've ran two series of the file create benchmarks on the name-node with 
different number of threads.
The first series is run on the regular name-node performing an edits log 
transaction on every create.
The transaction includes a synch to the disk.
In the second series the name-node is modified so that the synchs are turned 
off.
Each run of the benchmark performs the same number 10,000 of creates equally 
distributed between
running threads. I used a 4 core 2.8Ghz machine.
The following two tables summarized the results. Time is in milliseconds.

|| threads || time (msec)\\with synch || ops/sec\\with synch ||
| 1 | 13074 | 764 |
| 2 | 8883 | 1125 |
| 4 | 7319 | 1366 |
| 10 | 7094 | 1409 |
| 20 | 6785 | 1473 |
| 40 | 6776 | 1475 |
| 100 | 6899 | 1449 |
| 200 | 7131 | 1402 |
| 400 | 7084 | 1411 |
| 1000 | 7181 | 1392 |

|| threads || time (msec)\\no synch || ops/sec\\no synch ||
| 1 | 4559 | 2193 |
| 2 | 4979 | 2008 |
| 4 | 5617 | 1780 |
| 10 | 5679 | 1760 |
| 20 | 5550 | 1801 |
| 40 | 5804 | 1722 |
| 100 | 5871 | 1703 |
| 200 | 6037 | 1656 |
| 400 | 5855 | 1707 |
| 1000 | 6069 | 1647 |

The results show:
# (Table 1) The new synchronization mechanism that batches synch calls from 
different threads works well.
For one thread all synchs cause a real IO making it slow. The more threads is 
used the more synchs are
batched resulting in better performance. The performance grows up to a certain 
point and then stabilizes
at about 1450 ops/sec.
# (Table 2) Operations that do not require disk IOs are constrained by memory 
locks.
Without synchs the one-threaded execution is the fastest, because there are no 
waits.
More threads start to intervene with each other and have to wait.
Again the performance stabilizes at about 1700 ops/sec, and does not degrade 
further.
# Our default 10 handlers per name-node is not the best choice neither for the 
io bound nor for the pure
memory operations. We should increase the default to 20 handlers and on big 
classes 100 handlers
or more can be used without loss of performance. In fact with more handlers 
more operations can be handled
simultaneously, which prevents the name-node from dropping calls that are close 
to timeout.

h3. Block report benchmark.

In this benchmarks each thread pretends it is a data-node and calls 
blockReport() with the same blocks.
All blocks are real, that is they were previously allocated by the name-node 
and assigned to the data-nodes.
Some reports can contain fake blocks, and some can have missing blocks.
Each block report consists of 10,000 blocks. The total number of reports sent 
is 1000.
The reports are equally divided between the data-nodes so that each of them 
sends equal number of reports.

Here is the table with the results.

|| data-nodes || time (msec) || ops/sec ||
| 1 | 42234 | 24 |
| 2 | 9412 | 106 |
| 4 | 11465 | 87 |
| 10 | 15632 | 64 |
| 20 | 17623 | 57 |
| 40 | 19563 | 51 |
| 100 | 24315 | 41 |
| 200 | 29789 | 34 |
| 400 | 23636 | 42 |
| 600 | 39682 | 26 |

I did not have time to analyze this yet. So comments are welcome.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to