Jonathan Park created ACCUMULO-2668:
---------------------------------------
Summary: slow WAL writes
Key: ACCUMULO-2668
URL: https://issues.apache.org/jira/browse/ACCUMULO-2668
Project: Accumulo
Issue Type: Bug
Affects Versions: 1.6.0
Reporter: Jonathan Park
Attachments: noflush.diff
During continuous ingest, we saw over 70% of our ingest time taken up by writes
to the WAL. When we ran the DfsLogger in isolation (created one outside of the
Tserver), we saw about ~25MB/s throughput (computed by taking the estimated
size of the mutations sent to the DfsLogger class divided by the time it took
for it to flush + sync the data to HDFS).
After investigating, we found one possible culprit was the NoFlushOutputStream.
It is a subclass of java.io.FilterOutputStream but does not override the
write(byte[], int, int) method signature. The javadoc indicates that subclasses
of the FilterOutputStream should provide a more efficient implementation.
I've attached a small diff that illustrates and addresses the issue but this
may not be how we ultimately want to fix it.
As a side note, I may be misreading the implementation of DfsLogger, but it
looks like we always make use of the NoFlushOutputStream, even if encryption
isn't enabled. There appears to be a faulty check in the DfsLogger.open()
implementation that I don't believe can be satisfied (line 384).
--
This message was sent by Atlassian JIRA
(v6.2#6252)