[ 
https://issues.apache.org/jira/browse/ACCUMULO-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600405#comment-13600405
 ] 

Keith Turner commented on ACCUMULO-1083:
----------------------------------------

bq. Concurrent, as in, a sub-set of tablets use a log, or concurrent, as in, 
there are multiple log files available to log to, as described in the Big Table 
paper?

We will need to decide which of those strategies to use.  One has implications 
of writes and the other has implications for recovery.  If a subset of the 
tablets uses a log, then every batch of mutations coming into a tablet server 
may need to be split and queued to multiple walogs.  If a batch of mutations 
can be written to just an walog, then that walog will need to be added to all 
tablets that batch has data for.  I think this has the potential for better 
write performance, but it can cause tablets to have more walogs than they do 
now.  But I am not sure if an individual tablet having more walogs matters.  
One thing that matters for recovery is the amount of data that needs to be 
sorted when a tserver fails and I am not sure this really changes with the two 
approaches.
                
> add concurrency to HDFS write-ahead log
> ---------------------------------------
>
>                 Key: ACCUMULO-1083
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1083
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Adam Fuchs
>             Fix For: 1.6.0
>
>         Attachments: walog-performance.jpg, 
> walog-replication-factor-performance.jpg
>
>
> When running tablet servers on beefy nodes (lots of disks), the write-ahead 
> log can be a serious bottleneck. Today we ran a continuous ingest test of 
> 1.5-SNAPSHOT on an 8-node (plus a master node) cluster in which the nodes had 
> 32 cores and 15 drives each. Running with write-ahead log off resulted in a 
> >4x performance improvement sustained over a long period.
> I believe the culprit is that the WAL is only using one file at a time per 
> tablet server, which means HDFS is only appending to one drive (plus 
> replicas). If we increase the number of concurrent WAL files supported on a 
> tablet server we could probably drastically improve the performance on 
> systems with many disks. As it stands, I believe Accumulo is significantly 
> more optimized for a larger number of smaller nodes (3-4 drives).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to