[ 
https://issues.apache.org/jira/browse/ACCUMULO-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072964#comment-15072964
 ] 

Eric Newton commented on ACCUMULO-624:
--------------------------------------

I wrote a little experiment: 10 threads allocate 100K decompressors each.

Using  {{gz.returnDecompressor(gz.getDecompressor()}} all threads complete in 
1.4 seconds.

Using {{gz.getCodec().createDecompressor()}} all threads complete in 20 seconds.

So, it is quite a bit faster to use the pool. But, allocating decompressors 
without the pool still takes less than a millisecond.

It seems we are not the only ones that think that [codec reuse may not be worth 
it | 
https://github.com/prestodb/presto-hive-apache/blob/master/src/main/java/org/apache/hadoop/hive/ql/io/CodecPool.java].

> iterators may open lots of compressors
> --------------------------------------
>
>                 Key: ACCUMULO-624
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-624
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>            Reporter: Eric Newton
>
> A large iterator tree may create many instances of Compressors.  These 
> instances are pulled from a pool that never decreases in size.  So, if 50 
> simultaneous queries are run over dozens of files, each with a complex 
> iterator stack, there will be thousands of compressors created.  Each of 
> these holds a large buffer.  This can cause the server to run out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to