[
https://issues.apache.org/jira/browse/PARQUET-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vitalii Diravka updated PARQUET-1006:
-------------------------------------
Description:
After PARQUET-160 was resolved, ColumnChunkPageWriter started using
ConcatenatingByteArrayCollector. There are all data is collected in the List of
byte[], before writing the page. No way to use direct memory for allocating
buffers. ByteBufferAllocator is present in the
[ColumnChunkPageWriter|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java#L73]
class, but never used.
Using of java heap space in some cases can cause OOM exceptions or GC's
overhead.
ByteBufferAllocator should be used in the ConcatenatingByteArrayCollector or
OutputStream classes.
was:
After PARQUET-160 was resolved, ColumnChunkPageWriter started using
ConcatenatingByteArrayCollector. There are all data is collected in the List of
byte[], before writing the page. No way to use direct memory for allocating
buffers. ByteBufferAllocator is present in the ColumnChunkPageWriter class, but
never used.
Using of java heap space in some cases can cause OOM exceptions or GC's
overhead.
ByteBufferAllocator should be used in the ConcatenatingByteArrayCollector or
OutputStream classes.
> ColumnChunkPageWriter uses only heap memory.
> --------------------------------------------
>
> Key: PARQUET-1006
> URL: https://issues.apache.org/jira/browse/PARQUET-1006
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.8.0
> Reporter: Vitalii Diravka
> Fix For: 1.9.0
>
>
> After PARQUET-160 was resolved, ColumnChunkPageWriter started using
> ConcatenatingByteArrayCollector. There are all data is collected in the List
> of byte[], before writing the page. No way to use direct memory for
> allocating buffers. ByteBufferAllocator is present in the
> [ColumnChunkPageWriter|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java#L73]
> class, but never used.
> Using of java heap space in some cases can cause OOM exceptions or GC's
> overhead.
> ByteBufferAllocator should be used in the ConcatenatingByteArrayCollector or
> OutputStream classes.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)