aokolnychyi opened a new pull request, #6876:
URL: https://github.com/apache/iceberg/pull/6876

   This PR improves our task and job abort handling in Spark 3.3.
   
   - This change leverages bulk deletes whenever possible. 
   - This change adds helpful log messages that indicate how many files were 
deleted and task context if any.
   
   ```
   [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] ERROR 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask - Aborting 
commit for partition 0 (task 0, attempt 0, stage 0.0)
   [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] INFO 
org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 2 file(s) (partition 
0 (task 0, attempt 0, stage 0.0))
   ...
   [Test worker] ERROR 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source 
write support IcebergBatchWrite(table=testhive.default.table, format=PARQUET) 
is aborting.
   [Test worker] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - 
Deleted 0 file(s) (job abort)
   [Test worker] ERROR 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source 
write support IcebergBatchWrite(table=testhive.default.table, format=PARQUET) 
aborted.
   ```
   
   ```
   [Executor task launch worker for task 0.0 in stage 3.0 (TID 4)] ERROR 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask - Aborting 
commit for partition 0 (task 4, attempt 0, stage 3.0)
   [Executor task launch worker for task 0.0 in stage 3.0 (TID 4)] INFO 
org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 2 file(s) using bulk 
deletes (partition 0 (task 4, attempt 0, stage 3.0))
   ...
   [Test worker] ERROR 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source 
write support IcebergBatchWrite(table=testhivebulk.default.table, 
format=PARQUET) is aborting.
   [Test worker] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - 
Deleted 0 file(s) using bulk deletes (job abort)
   [Test worker] ERROR 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source 
write support IcebergBatchWrite(table=testhivebulk.default.table, 
format=PARQUET) aborted.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to