dchristle commented on issue #3703:
URL: https://github.com/apache/iceberg/issues/3703#issuecomment-1402362942
I'm following up to say I got `deleteOrphanFiles` to complete successfully.
After bumping the memory, I was confused why I didn't see any output in the
logs from an occasional `RetryHttpInitializer: Encountered status code 503 when
sending DELETE request to URL` error. I let it run for more than 24 hours; it
seemed like the driver was hung rather than deleting any orphan files.
In other GitHub issues on deleting orphan files, increasing the number of
threads is mentioned. I modified my Spark job to do this with
`.executeDeleteWith`:
```
val executorService = Executors.newFixedThreadPool(30)
SparkActions
.get()
.deleteOrphanFiles(icebergTable)
.executeDeleteWith(executorService)
.execute()
```
The frequency of the 503 retry errors went up. My interpretation is these
errors have some small fixed probability of occurring on a Google Storage
delete operation. Since there are now 30 concurrent delete operations, the log
message is seen more frequently.
I let this new job run for about 36 hours & it finished deleting orphan
files successfully. I wonder if there's some way to emit periodic log messages
indicating the number of files that have been deleted, perhaps every 5 minutes.
Once my driver had sufficient memory, the deletes were likely happening
correctly, but as a user, I was confused when I didn't see any log output. The
delete orphan files operation is different from other maintenance operations --
it can't be seen in the Spark UI as a job or stage.
Any thoughts on adding some periodic log outputs? @RussellSpitzer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]