[ https://issues.apache.org/jira/browse/PARQUET-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tongjie Chen reopened PARQUET-99: --------------------------------- Re-open the jira. We hit the same problem a few more times and unfortunately there are no configuration to work around it. Basically when there are continuous fat rows, Parquet needs an elegant way to handle it. > parquet writer runs into OOM during writing > ------------------------------------------- > > Key: PARQUET-99 > URL: https://issues.apache.org/jira/browse/PARQUET-99 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Affects Versions: parquet-mr_1.6.0 > Reporter: Tongjie Chen > > If columns contains lots of lengthy string value, it will run into OOM error > during writing. > 2014-09-22 19:16:11,626 FATAL [main] org.apache.hadoop.mapred.YarnChild: > Error running child : java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2271) > at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) > at > java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) > at > org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:83) > at > org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76) > at > parquet.bytes.CapacityByteArrayOutputStream.writeTo(CapacityByteArrayOutputStream.java:144) > at > parquet.bytes.BytesInput$CapacityBAOSBytesInput.writeAllTo(BytesInput.java:308) > at > parquet.bytes.BytesInput$SequenceBytesIn.writeAllTo(BytesInput.java:233) > at > parquet.hadoop.CodecFactory$BytesCompressor.compress(CodecFactory.java:108) > at > parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:110) > at > parquet.column.impl.ColumnWriterImpl.writePage(ColumnWriterImpl.java:147) > at parquet.column.impl.ColumnWriterImpl.flush(ColumnWriterImpl.java:236) > at > parquet.column.impl.ColumnWriteStoreImpl.flush(ColumnWriteStoreImpl.java:113) > at > parquet.hadoop.InternalParquetRecordWriter.flushStore(InternalParquetRecordWriter.java:151) > at > parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:130) > at > parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:122) > at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81) > at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37) > at > org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77) > at > org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:688) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) > at > org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502) -- This message was sent by Atlassian JIRA (v6.3.4#6332)