[ 
https://issues.apache.org/jira/browse/PARQUET-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014704#comment-15014704
 ] 

Ryan Blue commented on PARQUET-394:
-----------------------------------

Good to hear you found a solution. This is not an uncommon problem because so 
much data is buffered per file. Generally, try to avoid writing to multiple 
Parquet output files at once by shuffling data for each partition to a single 
reducer.

> OOM when writing a table with 1700 columns in hive
> --------------------------------------------------
>
>                 Key: PARQUET-394
>                 URL: https://issues.apache.org/jira/browse/PARQUET-394
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.5.0
>         Environment: Centos + JVM 1.8.0_60  CDH5.4.7
>            Reporter: Yuren Wu
>              Labels: OOM
>
> When running insert into <tablename> select * from <source> following 
> exception were thrown. There are 1700 columns (all string), and total length 
> of a single row is around 7000 bytes. 
> Yarn container size has been increased from 1GB to 4GB. Not quit helping. 
> FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : 
> java.lang.OutOfMemoryError: Java heap space
>       at parquet.column.values.dictionary.IntList.initSlab(IntList.java:90)
>       at parquet.column.values.dictionary.IntList.<init>(IntList.java:86)
>       at 
> parquet.column.values.dictionary.DictionaryValuesWriter.<init>(DictionaryValuesWriter.java:93)
>       at 
> parquet.column.values.dictionary.DictionaryValuesWriter$PlainBinaryDictionaryValuesWriter.<init>(DictionaryValuesWriter.java:229)
>       at 
> parquet.column.ParquetProperties.dictionaryWriter(ParquetProperties.java:131)
>       at 
> parquet.column.ParquetProperties.dictWriterWithFallBack(ParquetProperties.java:178)
>       at 
> parquet.column.ParquetProperties.getValuesWriter(ParquetProperties.java:203)
>       at parquet.column.impl.ColumnWriterV1.<init>(ColumnWriterV1.java:84)
>       at 
> parquet.column.impl.ColumnWriteStoreV1.newMemColumn(ColumnWriteStoreV1.java:68)
>       at 
> parquet.column.impl.ColumnWriteStoreV1.getColumnWriter(ColumnWriteStoreV1.java:56)
>       at 
> parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.<init>(MessageColumnIO.java:183)
>       at parquet.io.MessageColumnIO.getRecordWriter(MessageColumnIO.java:375)
>       at 
> parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107)
>       at 
> parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:127)
>       at 
> parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:118)
>       at 
> parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
>       at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
>       at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)
>       at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:695)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>       at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>       at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>       at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>       at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
>       at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>       at java.security.AccessController.doPrivileged(Native Method)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to