[GitHub] [iceberg] sririshindra commented on pull request #7127: [Core][Spark] Improve DeleteOrphanFiles action to return additional details of deleted orphan files

via GitHub Fri, 31 Mar 2023 18:49:56 -0700


sririshindra commented on PR #7127:
URL: https://github.com/apache/iceberg/pull/7127#issuecomment-1492784798


   > > One of the common causes of delete failure in the public cloud is 
hitting the API quotas and unnecessary re-runs of the delete action
   > 
   > Not really. What size of pages are you issuing delete requests to S3? Each 
object, even in a bulk delete, is one Write IOP; if you send a full 1000 entry 
list then under certain conditions the sleep and retry of that request is it's 
own thundering herd. Since 
[HADOOP-16823](https://issues.apache.org/jira/browse/HADOOP-16823) we've had a 
default page size of 200 and nobody complains about 503-triggered deletion 
failures, even when partition rebalancing operations massively reduce IO 
capacity . If you are seeing problems the S3A connector then complain. If you 
are seeing it in your own code -why not fix it there rather than expose the 
failure to apps?
   
   Hi Steve, I am partially summarizing our offline conversation here. Maybe 
the statement that "causes of delete failure in the public cloud is hitting the 
API quotas" is wrong, but I think this PR still adds value.
   
   When a remove orphan files procedure is called, currently a list of files 
that were supposed to be deleted are displayed, but that procedure doesn't give 
nay information about whether those files were in fact deleted. This PR simply 
displays the status of the delete file along with the delete file itself. If a 
failure occurs for some reason, the exception message is also displayed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] sririshindra commented on pull request #7127: [Core][Spark] Improve DeleteOrphanFiles action to return additional details of deleted orphan files

Reply via email to