[ 
https://issues.apache.org/jira/browse/PARQUET-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788297#comment-16788297
 ] 

Qinghui Xu commented on PARQUET-282:
------------------------------------

This looks like not a problem from parquet-mr itself, let's close it?

> OutOfMemoryError in job commit / ParquetMetadataConverter
> ---------------------------------------------------------
>
>                 Key: PARQUET-282
>                 URL: https://issues.apache.org/jira/browse/PARQUET-282
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.6.0
>         Environment: CentOS, MapR,. Scalding
>            Reporter: hy5446
>            Priority: Critical
>
> We're trying to write some 14B rows (about 3.6 TB in parquets) to parquet 
> files. When our ETL job finishes, it throws this exception, and the status is 
> "died in job commit".
> 2015-05-14 09:24:28,158 FATAL [CommitterEvent Processor #4] 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[CommitterEvent Processor #4,5,main] threw an Error.  Shutting down 
> now...
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>       at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
>       at java.nio.ByteBuffer.wrap(ByteBuffer.java:396)
>       at parquet.format.Statistics.setMin(Statistics.java:237)
>       at 
> parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:243)
>       at 
> parquet.format.converter.ParquetMetadataConverter.addRowGroup(ParquetMetadataConverter.java:167)
>       at 
> parquet.format.converter.ParquetMetadataConverter.toParquetMetadata(ParquetMetadataConverter.java:79)
>       at 
> parquet.hadoop.ParquetFileWriter.serializeFooter(ParquetFileWriter.java:405)
>       at 
> parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:433)
>       at 
> parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:423)
>       at 
> parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
>       at 
> parquet.hadoop.mapred.MapredParquetOutputCommitter.commitJob(MapredParquetOutputCommitter.java:43)
>       at 
> org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:259)
>       at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:253)
>       at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:216)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> This seems to have something to do with the _metadata file creation, as the 
> parquet files are perfectly fine and usable. Also I'm not sure how to 
> alleviate this (i.e. add more heap space) since the crash is outside the 
> Map/Reduce tasks themselves but seems in the job/application controller 
> itself. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to