[ 
https://issues.apache.org/jira/browse/HADOOP-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059933#comment-14059933
 ] 

Gopal V commented on HADOOP-10681:
----------------------------------

The synchronized blocks would've made a lot of sense if setInput() or 
decompress/compress() was atomic.

Since it only reads part of the data (64kb or so) in for an invocation, the 
user has never been able to use this with multiple threads safely.

To make sure this was never used with threading in something like HBase, I 
cross-checked & HBase has an unsynchronized improved version of gzip which 
writes its own header/trailer chunks without synchronization.

https://github.com/apache/hbase/blob/c61cb7fb55124547a36a6ef56afaec43676039f8/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/ReusableStreamGzipCodec.java#L100

> Remove synchronized blocks from SnappyCodec and ZlibCodec buffering inner loop
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-10681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10681
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: performance
>    Affects Versions: 2.2.0, 2.4.0, 2.5.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: perfomance
>         Attachments: HADOOP-10681.1.patch, HADOOP-10681.2.patch, 
> HADOOP-10681.3.patch, HADOOP-10681.4.patch, compress-cmpxchg-small.png, 
> perf-top-spill-merge.png, snappy-perf-unsync.png
>
>
> The current implementation of SnappyCompressor spends more time within the 
> java loop of copying from the user buffer into the direct buffer allocated to 
> the compressor impl, than the time it takes to compress the buffers.
> !perf-top-spill-merge.png!
> The bottleneck was found to be java monitor code inside SnappyCompressor.
> The methods are neatly inlined by the JIT into the parent caller 
> (BlockCompressorStream::write), which unfortunately does not flatten out the 
> synchronized blocks.
> !compress-cmpxchg-small.png!
> The loop does a write of small byte[] buffers (each IFile key+value). 
> I counted approximately 6 monitor enter/exit blocks per k-v pair written.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to