Ketki Bukkawar created PARQUET-2166: ---------------------------------------
Summary: parquet writer runs into OOM during writing Key: PARQUET-2166 URL: https://issues.apache.org/jira/browse/PARQUET-2166 Project: Parquet Issue Type: Bug Components: parquet-avro Affects Versions: 1.12.1, 1.10.1 Reporter: Ketki Bukkawar Hi team, We are getting OOM error on trying to writer data to the parquet file. Please check below stack trace: {quote} Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at org.apache.parquet.hadoop.codec.SnappyCompressor.setInput(SnappyCompressor.java:97) at org.apache.parquet.hadoop.codec.NonBlockedCompressorStream.write(NonBlockedCompressorStream.java:48) at org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeToOutput(CapacityByteArrayOutputStream.java:227) at org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeTo(CapacityByteArrayOutputStream.java:247) at org.apache.parquet.bytes.BytesInput$CapacityBAOSBytesInput.writeAllTo(BytesInput.java:405) at org.apache.parquet.bytes.BytesInput$SequenceBytesIn.writeAllTo(BytesInput.java:296) at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:164) at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:95) at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147) at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235) at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122) at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172) at org.apache.parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:148) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:130) at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299) at com.fivetran.warehouses.common.parquet.AvroBasedParquetWriterAdapter.write(AvroBasedParquetWriterAdapter.java:39) {quote} We believe that most of the memory is being consumed by slabs. From below warning we can see that a content column acquired 108 slabs: {quote} [content] optional binary content (UTF8) { r:0 d: RunLengthBitPackingHybrid 64 bytes data: FallbackValuesWriter{ data: initial: DictionaryValuesWriter{ data: initial: dict:0 data: initial: values:0 data: initial:} data: fallback: PLAIN CapacityByteArrayOutputStream 108 slabs, 162,188,576 bytes data:} pages: ColumnChunkPageWriter ConcatenatingByteArrayCollector 0 slabs, 0 bytes total: 162,188,590/162,188,640 } {quote} Could you please help us resolve this issue? Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010)