[
https://issues.apache.org/jira/browse/SPARK-32742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Luo updated SPARK-32742:
-----------------------------
Description:
Hi team,
This is my first time to report an issue here.
We submitted and ran the spark job on the cluster.
We found that one of the parquet output partition is missing in the output
directory. We checked the spark job log, all the tasks status are showing
success. The output record size matches expected number.
However, we checked the container log, found that there was a warning says *No
Output found for attempt_20200819094307_0003_m_000002_11*, which stopped moving
the output from taskAttemptPath to output directory. As a result, we are
missing some of the output rows.
Re-run the job helped to solve the issue, however the report is critical for
us. It is appreciated if you can advise the cause for the issue.
Below are the container logs:
{code:java}
20/08/19 09:44:57 INFO output.FileOutputCommitter: FileOutputCommitter skip
cleanup _temporary folders under output directory:false, ignore cleanup
failures: false
20/08/19 09:44:57 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using user
defined output committer class parquet.hadoop.ParquetOutputCommitter
20/08/19 09:44:57 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 2
20/08/19 09:44:57 INFO output.FileOutputCommitter: FileOutputCommitter skip
cleanup _temporary folders under output directory:false, ignore cleanup
failures: false
20/08/19 09:44:57 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using
output committer class parquet.hadoop.ParquetOutputCommitter
20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 12.370642 ms
20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 6.927118 ms
20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 12.004204 ms
20/08/19 09:44:57 INFO parquet.ParquetWriteSupport: Initialized Parquet
WriteSupport with Catalyst schema:
..... (skipped)
20/08/19 09:44:57 WARN output.FileOutputCommitter: No Output found for
attempt_20200819094307_0003_m_000002_11
20/08/19 09:44:57 INFO mapred.SparkHadoopMapRedUtil:
attempt_20200819094307_0003_m_000002_11: Committed
{code}
was:
Hi team,
This is my first time to report an issue here.
We submitted and ran the spark job on the cluster.
We found that one of the parquet output partition is missing in the output
directory. We checked the spark job log, all the tasks status are showing
success. The output record size matches expected number.
However, we checked the container log, found that there was a warning says *No
Output found for attempt_20200819094307_0003_m_000002_11*, which stopped moving
the output from taskAttemptPath to output directory. As a result, we are
missing some of the output rows.
Re-run the job helped to solve the issue, however the report is critical for
us. It is appreciated if you can advise the cause for the issue.
Below are the container logs:
20/08/19 09:44:57 INFO output.FileOutputCommitter: FileOutputCommitter skip
cleanup _temporary folders under output directory:false, ignore cleanup
failures: false
20/08/19 09:44:57 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using user
defined output committer class parquet.hadoop.ParquetOutputCommitter
20/08/19 09:44:57 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 2
20/08/19 09:44:57 INFO output.FileOutputCommitter: FileOutputCommitter skip
cleanup _temporary folders under output directory:false, ignore cleanup
failures: false
20/08/19 09:44:57 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using
output committer class parquet.hadoop.ParquetOutputCommitter
20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 12.370642 ms
20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 6.927118 ms
20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 12.004204 ms
20/08/19 09:44:57 INFO parquet.ParquetWriteSupport: Initialized Parquet
WriteSupport with Catalyst schema:
..... (skipped)
{color:#FF0000}20/08/19 09:44:57 WARN output.FileOutputCommitter: No Output
found for attempt_20200819094307_0003_m_000002_11{color}
{color:#FF0000}20/08/19 09:44:57 INFO mapred.SparkHadoopMapRedUtil:
attempt_20200819094307_0003_m_000002_11: Committed{color}
> FileOutputCommitter warns "No Output found for attempt"
> -------------------------------------------------------
>
> Key: SPARK-32742
> URL: https://issues.apache.org/jira/browse/SPARK-32742
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.4.0
> Environment: Hadoop 2.6.0-cdh5.16.2
> YARN(MR2 included)
>
> Reporter: Ryan Luo
> Priority: Blocker
>
> Hi team,
> This is my first time to report an issue here.
> We submitted and ran the spark job on the cluster.
> We found that one of the parquet output partition is missing in the output
> directory. We checked the spark job log, all the tasks status are showing
> success. The output record size matches expected number.
> However, we checked the container log, found that there was a warning says
> *No Output found for attempt_20200819094307_0003_m_000002_11*, which stopped
> moving the output from taskAttemptPath to output directory. As a result, we
> are missing some of the output rows.
> Re-run the job helped to solve the issue, however the report is critical for
> us. It is appreciated if you can advise the cause for the issue.
>
> Below are the container logs:
>
> {code:java}
> 20/08/19 09:44:57 INFO output.FileOutputCommitter: FileOutputCommitter skip
> cleanup _temporary folders under output directory:false, ignore cleanup
> failures: false
> 20/08/19 09:44:57 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using
> user defined output committer class parquet.hadoop.ParquetOutputCommitter
> 20/08/19 09:44:57 INFO output.FileOutputCommitter: File Output Committer
> Algorithm version is 2
> 20/08/19 09:44:57 INFO output.FileOutputCommitter: FileOutputCommitter skip
> cleanup _temporary folders under output directory:false, ignore cleanup
> failures: false
> 20/08/19 09:44:57 INFO datasources.SQLHadoopMapReduceCommitProtocol: Using
> output committer class parquet.hadoop.ParquetOutputCommitter
> 20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 12.370642 ms
> 20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 6.927118 ms
> 20/08/19 09:44:57 INFO codegen.CodeGenerator: Code generated in 12.004204 ms
> 20/08/19 09:44:57 INFO parquet.ParquetWriteSupport: Initialized Parquet
> WriteSupport with Catalyst schema:
> ..... (skipped)
> 20/08/19 09:44:57 WARN output.FileOutputCommitter: No Output found for
> attempt_20200819094307_0003_m_000002_11
> 20/08/19 09:44:57 INFO mapred.SparkHadoopMapRedUtil:
> attempt_20200819094307_0003_m_000002_11: Committed
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]