[
https://issues.apache.org/jira/browse/ACCUMULO-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072964#comment-15072964
]
Eric Newton commented on ACCUMULO-624:
--------------------------------------
I wrote a little experiment: 10 threads allocate 100K decompressors each.
Using {{gz.returnDecompressor(gz.getDecompressor()}} all threads complete in
1.4 seconds.
Using {{gz.getCodec().createDecompressor()}} all threads complete in 20 seconds.
So, it is quite a bit faster to use the pool. But, allocating decompressors
without the pool still takes less than a millisecond.
It seems we are not the only ones that think that [codec reuse may not be worth
it |
https://github.com/prestodb/presto-hive-apache/blob/master/src/main/java/org/apache/hadoop/hive/ql/io/CodecPool.java].
> iterators may open lots of compressors
> --------------------------------------
>
> Key: ACCUMULO-624
> URL: https://issues.apache.org/jira/browse/ACCUMULO-624
> Project: Accumulo
> Issue Type: Bug
> Components: tserver
> Reporter: Eric Newton
>
> A large iterator tree may create many instances of Compressors. These
> instances are pulled from a pool that never decreases in size. So, if 50
> simultaneous queries are run over dozens of files, each with a complex
> iterator stack, there will be thousands of compressors created. Each of
> these holds a large buffer. This can cause the server to run out of memory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)