[
https://issues.apache.org/jira/browse/PARQUET-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788297#comment-16788297
]
Qinghui Xu commented on PARQUET-282:
------------------------------------
This looks like not a problem from parquet-mr itself, let's close it?
> OutOfMemoryError in job commit / ParquetMetadataConverter
> ---------------------------------------------------------
>
> Key: PARQUET-282
> URL: https://issues.apache.org/jira/browse/PARQUET-282
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.6.0
> Environment: CentOS, MapR,. Scalding
> Reporter: hy5446
> Priority: Critical
>
> We're trying to write some 14B rows (about 3.6 TB in parquets) to parquet
> files. When our ETL job finishes, it throws this exception, and the status is
> "died in job commit".
> 2015-05-14 09:24:28,158 FATAL [CommitterEvent Processor #4]
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
> Thread[CommitterEvent Processor #4,5,main] threw an Error. Shutting down
> now...
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
> at java.nio.ByteBuffer.wrap(ByteBuffer.java:396)
> at parquet.format.Statistics.setMin(Statistics.java:237)
> at
> parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:243)
> at
> parquet.format.converter.ParquetMetadataConverter.addRowGroup(ParquetMetadataConverter.java:167)
> at
> parquet.format.converter.ParquetMetadataConverter.toParquetMetadata(ParquetMetadataConverter.java:79)
> at
> parquet.hadoop.ParquetFileWriter.serializeFooter(ParquetFileWriter.java:405)
> at
> parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:433)
> at
> parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:423)
> at
> parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
> at
> parquet.hadoop.mapred.MapredParquetOutputCommitter.commitJob(MapredParquetOutputCommitter.java:43)
> at
> org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:259)
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:253)
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:216)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> This seems to have something to do with the _metadata file creation, as the
> parquet files are perfectly fine and usable. Also I'm not sure how to
> alleviate this (i.e. add more heap space) since the crash is outside the
> Map/Reduce tasks themselves but seems in the job/application controller
> itself.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)