[ 
https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969216#comment-13969216
 ] 

ASF subversion and git services commented on ACCUMULO-2668:
-----------------------------------------------------------

Commit e4cef7f209551ebe17e43058e182ca22f8f89293 in accumulo's branch 
refs/heads/1.6.0-SNAPSHOT from [~parkjsung]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e4cef7f ]

ACCUMULO-2668 Override the write method which takes a byte[] to call the 
efficient method on the wrapped OutputStream

FilterOutputStream's implementation for this write method is horribly 
inefficient,
and causes a massive degradation in ingest performance.

Signed-off-by: Josh Elser <[email protected]>


> slow WAL writes
> ---------------
>
>                 Key: ACCUMULO-2668
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2668
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Jonathan Park
>            Assignee: Jonathan Park
>            Priority: Blocker
>              Labels: 16_qa_bug
>             Fix For: 1.6.1
>
>         Attachments: ACCUMULO-2668.0.patch.txt, noflush.diff
>
>
> During continuous ingest, we saw over 70% of our ingest time taken up by 
> writes to the WAL. When we ran the DfsLogger in isolation (created one 
> outside of the Tserver), we saw about ~25MB/s throughput as opposed to nearly 
> 100MB/s from just writing directly to an hdfs outputstream (computed by 
> taking the estimated size of the mutations sent to the DfsLogger class 
> divided by the time it took for it to flush + sync the data to HDFS).
> After investigating, we found one possible culprit was the 
> NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does 
> not override the write(byte[], int, int) method signature. The javadoc 
> indicates that subclasses of the FilterOutputStream should provide a more 
> efficient implementation.
> I've attached a small diff that illustrates and addresses the issue but this 
> may not be how we ultimately want to fix it.
> As a side note, I may be misreading the implementation of DfsLogger, but it 
> looks like we always make use of the NoFlushOutputStream, even if encryption 
> isn't enabled. There appears to be a faulty check in the DfsLogger.open() 
> implementation that I don't believe can be satisfied (line 384).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to