[ 
https://issues.apache.org/jira/browse/PARQUET-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated PARQUET-1006:
-------------------------------------
    Description: 
After PARQUET-160 was resolved, ColumnChunkPageWriter started using 
ConcatenatingByteArrayCollector. There are all data is collected in the List of 
byte[], before writing the page. No way to use direct memory for allocating 
buffers. ByteBufferAllocator is present in the 
[ColumnChunkPageWriter|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java#L73]
 class, but never used.

Using of java heap space in some cases can cause OOM exceptions or GC's 
overhead. 
ByteBufferAllocator should be used in the ConcatenatingByteArrayCollector or 
OutputStream classes.

  was:
After PARQUET-160 was resolved, ColumnChunkPageWriter started using 
ConcatenatingByteArrayCollector. There are all data is collected in the List of 
byte[], before writing the page. No way to use direct memory for allocating 
buffers. ByteBufferAllocator is present in the ColumnChunkPageWriter class, but 
never used.
Using of java heap space in some cases can cause OOM exceptions or GC's 
overhead. 
ByteBufferAllocator should be used in the ConcatenatingByteArrayCollector or 
OutputStream classes.


> ColumnChunkPageWriter uses only heap memory.
> --------------------------------------------
>
>                 Key: PARQUET-1006
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1006
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.8.0
>            Reporter: Vitalii Diravka
>             Fix For: 1.9.0
>
>
> After PARQUET-160 was resolved, ColumnChunkPageWriter started using 
> ConcatenatingByteArrayCollector. There are all data is collected in the List 
> of byte[], before writing the page. No way to use direct memory for 
> allocating buffers. ByteBufferAllocator is present in the 
> [ColumnChunkPageWriter|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java#L73]
>  class, but never used.
> Using of java heap space in some cases can cause OOM exceptions or GC's 
> overhead. 
> ByteBufferAllocator should be used in the ConcatenatingByteArrayCollector or 
> OutputStream classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to