[ https://issues.apache.org/jira/browse/SPARK-32742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17187751#comment-17187751 ]
Ryan Luo commented on SPARK-32742: ---------------------------------- [~hyukjin.kwon] Thanks for advising. > FileOutputCommitter warns "No Output found for attempt" > ------------------------------------------------------- > > Key: SPARK-32742 > URL: https://issues.apache.org/jira/browse/SPARK-32742 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.0 > Environment: Hadoop 2.6.0-cdh5.16.2 > YARN(MR2 included) > > Reporter: Ryan Luo > Priority: Major > > Hi team, > This is my first time to report an issue here. > We submitted and ran the spark job on the cluster. > We found that one of the parquet output partition is missing in the output > directory. We checked the spark job log, all the tasks status are showing > success. The output record size matches expected number. > However, we checked the container log, found that there was a warning says > *No Output found for attempt_20200819094307_0003_m_000002_11*, which stopped > moving the output from taskAttemptPath to output directory. As a result, we > are missing some of the output rows. > Re-run the job helped to solve the issue, however the report is critical for > us. It is appreciated if you can advise the cause for the issue. > > Below are the container logs: > > {code:java} > 20/08/19 09:44:57 INFO output.FileOutputCommitter: FileOutputCommitter skip > cleanup _temporary folders under output directory:false, ignore cleanup > failures: false > 20/08/19 09:44:57 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using > user defined output committer class parquet.hadoop.ParquetOutputCommitter > 20/08/19 09:44:57 INFO output.FileOutputCommitter: File Output Committer > Algorithm version is 2 > 20/08/19 09:44:57 INFO output.FileOutputCommitter: FileOutputCommitter skip > cleanup _temporary folders under output directory:false, ignore cleanup > failures: false > 20/08/19 09:44:57 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using > output committer class parquet.hadoop.ParquetOutputCommitter > 20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 12.370642 ms > 20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 6.927118 ms > 20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 12.004204 ms > 20/08/19 09:44:57 INFO parquet.ParquetWriteSupport: Initialized Parquet > WriteSupport with Catalyst schema: > ..... (skipped) > 20/08/19 09:44:57 WARN output.FileOutputCommitter: No Output found for > attempt_20200819094307_0003_m_000002_11 > 20/08/19 09:44:57 INFO mapred.SparkHadoopMapRedUtil: > attempt_20200819094307_0003_m_000002_11: Committed > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org