Jonathan Park created ACCUMULO-2668:
---------------------------------------

             Summary: slow WAL writes
                 Key: ACCUMULO-2668
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2668
             Project: Accumulo
          Issue Type: Bug
    Affects Versions: 1.6.0
            Reporter: Jonathan Park
         Attachments: noflush.diff

During continuous ingest, we saw over 70% of our ingest time taken up by writes 
to the WAL. When we ran the DfsLogger in isolation (created one outside of the 
Tserver), we saw about ~25MB/s throughput (computed by taking the estimated 
size of the mutations sent to the DfsLogger class divided by the time it took 
for it to flush + sync the data to HDFS).

After investigating, we found one possible culprit was the NoFlushOutputStream. 
It is a subclass of java.io.FilterOutputStream but does not override the 
write(byte[], int, int) method signature. The javadoc indicates that subclasses 
of the FilterOutputStream should provide a more efficient implementation.

I've attached a small diff that illustrates and addresses the issue but this 
may not be how we ultimately want to fix it.

As a side note, I may be misreading the implementation of DfsLogger, but it 
looks like we always make use of the NoFlushOutputStream, even if encryption 
isn't enabled. There appears to be a faulty check in the DfsLogger.open() 
implementation that I don't believe can be satisfied (line 384).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to