[jira] [Commented] (PARQUET-118) Provide option to use on-heap buffers for Snappy compression/decompression

Juliet Hougland (JIRA) Thu, 04 Feb 2016 16:46:10 -0800

    [ 
https://issues.apache.org/jira/browse/PARQUET-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133426#comment-15133426
 ]


Juliet Hougland commented on PARQUET-118:
-----------------------------------------

I've been working on [a 
patch|https://github.com/jhlch/parquet-mr/tree/bytebuffers] to allow on heap 
byte buffers for easier spark configuration. I've realized the [xerial-snappy 
itself explicitly disallows on heaps byte buffers.| 
https://github.com/xerial/snappy-java/blob/develop/src/main/java/org/xerial/snappy/Snappy.java#L136
 ]The best I've got for handling this is trying to make this change there too. 
Any thoughts? any idea how receptive they may be?

> Provide option to use on-heap buffers for Snappy compression/decompression
> --------------------------------------------------------------------------
>
>                 Key: PARQUET-118
>                 URL: https://issues.apache.org/jira/browse/PARQUET-118
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.6.0
>            Reporter: Patrick Wendell
>
> The current code uses direct off-heap buffers for decompression. If many 
> decompressors are instantiated across multiple threads, and/or the objects 
> being decompressed are large, this can lead to a huge amount of off-heap 
> allocation by the JVM. This can be exacerbated if overall, there is not heap 
> contention, since no GC will be performed to reclaim the space used by these 
> buffers.
> It would be nice if there was a flag we cold use to simply allocate on-heap 
> buffers here:
> https://github.com/apache/incubator-parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/codec/SnappyDecompressor.java#L28
> We ran into an issue today where these buffers totaled a very large amount of 
> storage and caused our Java processes (running within containers) to be 
> terminated by the kernel OOM-killer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PARQUET-118) Provide option to use on-heap buffers for Snappy compression/decompression

Reply via email to