[GitHub] [iceberg] aokolnychyi opened a new pull request, #6876: Spark 3.3: Improve task and job abort handling

via GitHub Fri, 17 Feb 2023 12:17:41 -0800


aokolnychyi opened a new pull request, #6876:
URL: https://github.com/apache/iceberg/pull/6876


   This PR improves our task and job abort handling in Spark 3.3.
   
   - This change leverages bulk deletes whenever possible. 
   - This change adds helpful log messages that indicate how many files were 
deleted and task context if any.
   
   ```
   [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] ERROR 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask - Aborting 
commit for partition 0 (task 0, attempt 0, stage 0.0)
   [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] INFO 
org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 2 file(s) (partition 
0 (task 0, attempt 0, stage 0.0))
   ...
   [Test worker] ERROR 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source 
write support IcebergBatchWrite(table=testhive.default.table, format=PARQUET) 
is aborting.
   [Test worker] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - 
Deleted 0 file(s) (job abort)
   [Test worker] ERROR 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source 
write support IcebergBatchWrite(table=testhive.default.table, format=PARQUET) 
aborted.
   ```
   
   ```
   [Executor task launch worker for task 0.0 in stage 3.0 (TID 4)] ERROR 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask - Aborting 
commit for partition 0 (task 4, attempt 0, stage 3.0)
   [Executor task launch worker for task 0.0 in stage 3.0 (TID 4)] INFO 
org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 2 file(s) using bulk 
deletes (partition 0 (task 4, attempt 0, stage 3.0))
   ...
   [Test worker] ERROR 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source 
write support IcebergBatchWrite(table=testhivebulk.default.table, 
format=PARQUET) is aborting.
   [Test worker] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - 
Deleted 0 file(s) using bulk deletes (job abort)
   [Test worker] ERROR 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source 
write support IcebergBatchWrite(table=testhivebulk.default.table, 
format=PARQUET) aborted.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi opened a new pull request, #6876: Spark 3.3: Improve task and job abort handling

Reply via email to