[
https://issues.apache.org/jira/browse/ACCUMULO-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597852#comment-13597852
]
Josh Elser commented on ACCUMULO-1083:
--------------------------------------
bq. [~ecn] Concurrent, as in, a sub-set of tablets use a log, or concurrent, as
in, there are multiple log files available to log to, as described in the Big
Table paper?
I think this got lost in the chatter.
bq. [~vines] Hadoop sync/appends have gone through several iterations since
they were introduced into the core release. Does anyone have any info on the
performance of appends in the various hadoop releases?
I would be curious about the results from running the same test against 0.23 or
2.0
bq. [~billie.rinaldi] If people think it's important for everyone to be able to
edit their own comments, we can ask.
I'd have to agree it'd be nice for everyone to be able to edit their own
message. Unless there's something to say otherwise.
bq. [~brassard] It's also worth noting that the default Accumulo behavior is to
use the default of Hadoop, so it's not necessarily set to 3 automatically.
Given we used to run with 2 by default (and we didn't have widespread data-loss
issues), we could have Accumulo do the setrep 2 by default for the WAL dir.
But, I think this merits discussion which is parallel to the one we're already
having.
bq. [~bills] For 1.5, that table seems to indicate that adding n replicas of a
log file results in a 1/(n + 1) effect on aggregate throughput.
Sounds about right.
But I'm not sure the picture isn't a bit larger than what we're looking at
here. [~brassard], can you post relevant HDFS configurations (hdfs block size,
datanode handler count, datanode xcievers, etc)? Given the beefiness of your
nodes, performance seems low to me. Doing some hand-wavy math.. 15*8=120drives,
~100MB/s write per drive, gives you ~12GB/s write across your cluster. Even
knocking that back to say 1.2GB/s write "actual".. that's still far above
5.7MB/s * 8nodes * 3replicas = 136MB/s.
With 32 cores, I wouldn't think you're CPU bound. Yet it doesn't seem like you
should be IO bound. I assume you're also memory heavy? Do you have stats from
Ganglia or iostat that show IO performance? What about network performance?
What filesystem are you using for the partitions hosting the data dirs?
Without research, I also don't know how the TServer "finishing" writing a
mutation aligns with the replicas writing. Does each datanode writing a replica
need to complete the write before the TServer will return to the client? Does
only one datanode need to finish? Lots of questions :D
> add concurrency to HDFS write-ahead log
> ---------------------------------------
>
> Key: ACCUMULO-1083
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1083
> Project: Accumulo
> Issue Type: Improvement
> Components: tserver
> Reporter: Adam Fuchs
> Fix For: 1.6.0
>
> Attachments: walog-performance.jpg,
> walog-replication-factor-performance.jpg
>
>
> When running tablet servers on beefy nodes (lots of disks), the write-ahead
> log can be a serious bottleneck. Today we ran a continuous ingest test of
> 1.5-SNAPSHOT on an 8-node (plus a master node) cluster in which the nodes had
> 32 cores and 15 drives each. Running with write-ahead log off resulted in a
> >4x performance improvement sustained over a long period.
> I believe the culprit is that the WAL is only using one file at a time per
> tablet server, which means HDFS is only appending to one drive (plus
> replicas). If we increase the number of concurrent WAL files supported on a
> tablet server we could probably drastically improve the performance on
> systems with many disks. As it stands, I believe Accumulo is significantly
> more optimized for a larger number of smaller nodes (3-4 drives).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira