hy5446 created PARQUET-282:
------------------------------

             Summary: OutOfMemoryError in job commit / ParquetMetadataConverter
                 Key: PARQUET-282
                 URL: https://issues.apache.org/jira/browse/PARQUET-282
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
    Affects Versions: 1.6.0
         Environment: CentOS, MapR,. Scalding
            Reporter: hy5446
            Priority: Critical


We're trying to write some 14B rows (about 3.6 TB in parquets) to parquet 
files. When our ETL job finishes, it throws this exception, and the status is 
"died in job commit".

2015-05-14 09:24:28,158 FATAL [CommitterEvent Processor #4] 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[CommitterEvent Processor #4,5,main] threw an Error.  Shutting down now...
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
        at java.nio.ByteBuffer.wrap(ByteBuffer.java:396)
        at parquet.format.Statistics.setMin(Statistics.java:237)
        at 
parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:243)
        at 
parquet.format.converter.ParquetMetadataConverter.addRowGroup(ParquetMetadataConverter.java:167)
        at 
parquet.format.converter.ParquetMetadataConverter.toParquetMetadata(ParquetMetadataConverter.java:79)
        at 
parquet.hadoop.ParquetFileWriter.serializeFooter(ParquetFileWriter.java:405)
        at 
parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:433)
        at 
parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:423)
        at 
parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
        at 
parquet.hadoop.mapred.MapredParquetOutputCommitter.commitJob(MapredParquetOutputCommitter.java:43)
        at 
org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:259)
        at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:253)
        at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:216)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

This seems to have something to do with the _metadata file creation, as the 
parquet files are perfectly fine and usable. Also I'm not sure how to alleviate 
this (i.e. add more heap space) since the crash is outside the Map/Reduce tasks 
themselves but seems in the job/application controller itself. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to