[
https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215309#comment-13215309
]
Todd Lipcon commented on HDFS-1623:
-----------------------------------
I've completed a round of preliminary performance testing on a 100-node
cluster, each with 48G RAM, gigabit ethernet, 6 disks, dual quad core with
hyperthreading.
The three builds were trunk (rev 5b8bdf3c11fc69a9076e2b1e3385ecd179975c7d), HA
branch (tip), and the same HA branch but with HA turned off.
The HA NN is configured to log to an NFS mount, which unfortunately is just a
normal Linux NFS box in this case, in the same rack. It was also unfortunately
running a DN, though I didn't realize it until I was halfway through the
benchmarks, and didn't want to start over from scratch.
I ran the TestDFSIO benchmarks as suggested by Konstantin, first with 95 files
and then with 380 files. Each file is 5GB. I will attach the results as a TSV
file momentarily. The overall summary is that there seems to be a ~4% hit for
turning on HA for the write benchmark. The read benchmark has too high
variability to really be sure. Even the write one doesn't seem that consistent,
given that the HA branch, when HA was disabled, actually went some 2.5% faster
than trunk. (these numbers from the 380-file case)
I also ran teragen in two different scenarios. The first scenario is a
realistic workload (256M blocks):
|| Build || Runtime || Slot time ||
|| HA On || 8m5s | 3682m |
|| HA Off || 8m12s | 3756m |
|| Trunk || 7m10s | 3163m |
Here there seems to be a bad performance degredation from the HA branch (about
20%).
The second workload was to try to stress the system by setting block size to
4MB (resulting in ~800-1000 block allocations/second):
|| Build || Runtime || Slot time || Edit log size ||
|| HA On || 12m24s | 6655m | 1.2GB |
|| HA Off || 8m40s | 4469m | 1.2GB |
|| Trunk || 7m4s | 3375m | 6.2MB |
Note here the much bigger degredation. I also included the size of the edit
logs in this later benchmark.
I'm pretty confident from looking at jstacks while this was running that the
bad performance is due to the new "persistBlocks" calls done in the HA branch.
We used to be sloppy about persisting blocks, whereas now we actually write
down all of the block allocations as they proceed. The 4MB block case was much
worse, since each file being written by teragen consisted of 423 blocks. The
persist blocks calls towards the end of the file were logging several KB worth
of data, and this resulted in a very large edit log as you can see in the table
above. The difference between HA-on and HA-off is that the HA-on mode actually
fsyncs all of these block allocations.
So before we merge I think we should do a bit of optimization in this area. I
will file a JIRA this evening or tomorrow with a couple of easy wins.
> High Availability Framework for HDFS NN
> ---------------------------------------
>
> Key: HDFS-1623
> URL: https://issues.apache.org/jira/browse/HDFS-1623
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Sanjay Radia
> Assignee: Sanjay Radia
> Attachments: HA-tests.pdf, HDFS-1623.trunk.patch,
> HDFS-High-Availability.pdf, NameNode HA_v2.pdf, NameNode HA_v2_1.pdf,
> Namenode HA Framework.pdf, ha-testplan.pdf, ha-testplan.tex
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira