[
https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Busbey updated ACCUMULO-2668:
----------------------------------
Labels: 16_qa_bug (was: )
> slow WAL writes
> ---------------
>
> Key: ACCUMULO-2668
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2668
> Project: Accumulo
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Jonathan Park
> Labels: 16_qa_bug
> Attachments: noflush.diff
>
>
> During continuous ingest, we saw over 70% of our ingest time taken up by
> writes to the WAL. When we ran the DfsLogger in isolation (created one
> outside of the Tserver), we saw about ~25MB/s throughput as opposed to nearly
> 100MB/s from just writing directly to an hdfs outputstream (computed by
> taking the estimated size of the mutations sent to the DfsLogger class
> divided by the time it took for it to flush + sync the data to HDFS).
> After investigating, we found one possible culprit was the
> NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does
> not override the write(byte[], int, int) method signature. The javadoc
> indicates that subclasses of the FilterOutputStream should provide a more
> efficient implementation.
> I've attached a small diff that illustrates and addresses the issue but this
> may not be how we ultimately want to fix it.
> As a side note, I may be misreading the implementation of DfsLogger, but it
> looks like we always make use of the NoFlushOutputStream, even if encryption
> isn't enabled. There appears to be a faulty check in the DfsLogger.open()
> implementation that I don't believe can be satisfied (line 384).
--
This message was sent by Atlassian JIRA
(v6.2#6252)