Ketki Bukkawar created PARQUET-2166:
---------------------------------------

             Summary: parquet writer runs into OOM during writing
                 Key: PARQUET-2166
                 URL: https://issues.apache.org/jira/browse/PARQUET-2166
             Project: Parquet
          Issue Type: Bug
          Components: parquet-avro
    Affects Versions: 1.12.1, 1.10.1
            Reporter: Ketki Bukkawar


Hi team,
We are getting OOM error on trying to writer data to the parquet file. Please 
check below stack trace:

{quote}
Caused by: java.lang.OutOfMemoryError: Direct buffer memory at 
java.base/java.nio.Bits.reserveMemory(Bits.java:175) at 
java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) at 
java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at 
org.apache.parquet.hadoop.codec.SnappyCompressor.setInput(SnappyCompressor.java:97)
 at 
org.apache.parquet.hadoop.codec.NonBlockedCompressorStream.write(NonBlockedCompressorStream.java:48)
 at 
org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeToOutput(CapacityByteArrayOutputStream.java:227)
 at 
org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeTo(CapacityByteArrayOutputStream.java:247)
 at 
org.apache.parquet.bytes.BytesInput$CapacityBAOSBytesInput.writeAllTo(BytesInput.java:405)
 at 
org.apache.parquet.bytes.BytesInput$SequenceBytesIn.writeAllTo(BytesInput.java:296)
 at 
org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:164)
 at 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:95)
 at 
org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147)
 at 
org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235) at 
org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
 at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)
 at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:148)
 at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:130)
 at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299) at 
com.fivetran.warehouses.common.parquet.AvroBasedParquetWriterAdapter.write(AvroBasedParquetWriterAdapter.java:39)
{quote}

We believe that most of the memory is being consumed by slabs. From below 
warning we can see that a content column acquired 108 slabs:

{quote}
[content] optional binary content (UTF8) { r:0 d: RunLengthBitPackingHybrid 64 
bytes data: FallbackValuesWriter{ data: initial: DictionaryValuesWriter{ data: 
initial: dict:0 data: initial: values:0 data: initial:} data: fallback: PLAIN 
CapacityByteArrayOutputStream 108 slabs, 162,188,576 bytes data:} pages: 
ColumnChunkPageWriter ConcatenatingByteArrayCollector 0 slabs, 0 bytes total: 
162,188,590/162,188,640 }
{quote}

Could you please help us resolve this issue?
Thanks




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to