aokolnychyi opened a new pull request, #6876: URL: https://github.com/apache/iceberg/pull/6876
This PR improves our task and job abort handling in Spark 3.3. - This change leverages bulk deletes whenever possible. - This change adds helpful log messages that indicate how many files were deleted and task context if any. ``` [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] ERROR org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask - Aborting commit for partition 0 (task 0, attempt 0, stage 0.0) [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 2 file(s) (partition 0 (task 0, attempt 0, stage 0.0)) ... [Test worker] ERROR org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source write support IcebergBatchWrite(table=testhive.default.table, format=PARQUET) is aborting. [Test worker] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 0 file(s) (job abort) [Test worker] ERROR org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source write support IcebergBatchWrite(table=testhive.default.table, format=PARQUET) aborted. ``` ``` [Executor task launch worker for task 0.0 in stage 3.0 (TID 4)] ERROR org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask - Aborting commit for partition 0 (task 4, attempt 0, stage 3.0) [Executor task launch worker for task 0.0 in stage 3.0 (TID 4)] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 2 file(s) using bulk deletes (partition 0 (task 4, attempt 0, stage 3.0)) ... [Test worker] ERROR org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source write support IcebergBatchWrite(table=testhivebulk.default.table, format=PARQUET) is aborting. [Test worker] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 0 file(s) using bulk deletes (job abort) [Test worker] ERROR org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source write support IcebergBatchWrite(table=testhivebulk.default.table, format=PARQUET) aborted. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
