[ 
https://issues.apache.org/jira/browse/HADOOP-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038136#comment-14038136
 ] 

Gopal V commented on HADOOP-10681:
----------------------------------

The patch removes unsafe synchronized blocks from the individual methods in 
Snappy and Zlib codecs.

This synchronization is slow and when used in the most common pattern for 
CompressionCodec is still thread-unsafe for sharing streams

{code}
 while (!compressor.finished()) {
   compressor.compress(buffer, 0, buffer.length);
 }
{code}

with loops running across multiple threads and would ideally require explicit 
code

synchronized(compressor) {
 while (!compressor.finished()) {
   compressor.compress(buffer, 0, buffer.length);
 }
}

to get correct stateful behaviour.

As code exists today it is not thread-safe and does slow lock-prefix x86_64 
instructions.

The JNI library below in SnappyCodec.c actually does its own locking mutexes 
for the actual critical sections within.

{code}
JNIEXPORT jint JNICALL 
Java_org_apache_hadoop_io_compress_snappy_SnappyDecompressor_decompressBytesDirect
(JNIEnv *env, jobject thisj){
....
  // Get the input direct buffer
  LOCK_CLASS(env, clazz, "SnappyDecompressor");
{code}

> Remove synchronized blocks from SnappyCodec and ZlibCodec buffering
> -------------------------------------------------------------------
>
>                 Key: HADOOP-10681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10681
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: performance
>    Affects Versions: 2.2.0, 2.4.0, 2.5.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: perfomance
>         Attachments: HADOOP-10681.1.patch, compress-cmpxchg-small.png, 
> perf-top-spill-merge.png, snappy-perf-unsync.png
>
>
> The current implementation of SnappyCompressor spends more time within the 
> java loop of copying from the user buffer into the direct buffer allocated to 
> the compressor impl, than the time it takes to compress the buffers.
> !perf-top-spill-merge.png!
> The bottleneck was found to be java monitor code inside SnappyCompressor.
> The methods are neatly inlined by the JIT into the parent caller 
> (BlockCompressorStream::write), which unfortunately does not flatten out the 
> synchronized blocks.
> !compress-cmpxchg-small.png!
> The loop does a write of small byte[] buffers (each IFile key+value). 
> I counted approximately 6 monitor enter/exit blocks per k-v pair written.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to