[ 
https://issues.apache.org/jira/browse/HADOOP-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554789#comment-16554789
 ] 

Steve Jacobs commented on HADOOP-15628:
---------------------------------------

We have an in house object store that has a bug related to Multi-Deletes where 
if the ACCESS_KEY doesn't own the bucket, the multi-delete ALWAYS fails. I 
checked the AWS docs and found that the response codes coming back were 
correct, and the xml was as well, it just wasn't being parsed by hdfs. (And its 
not the only tool not checking this either. S3CMD doesn't either). I would 
imaging you can reproduce with bucket / IAM policies as well but I haven't done 
so yet.

I'm currently running on hadoop 3.0.2 on the system I'm reproducing this on. 
Roger on not looking at the current rev, I'll work on getting a 3.1 install set 
up to test with. I checked everything except 3.1 and saw the same behavior, bad 
assumption on my part. 

Unfortunately due to the fact that I'm using a custom object store, S3guard 
isn't an option for me. (supposedly this store is strongly consistent though, 
so hopefully that won't cause me too much pain). 

I'll work on reproducing this on 3.1.

I'm also having fakeDir related issues and I'm aware of HADOOP-13230 . Presto 
doesn't clean fakeDir files up. It's just made tracking down delete related 
issues very confusing. 

> S3A Filesystem does not check return from AmazonS3Client deleteObjects
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-15628
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15628
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.9.1, 2.8.4, 3.1.1, 3.0.3
>         Environment: Hadoop 3.0.2 / Hadoop 2.8.3
> Hive 2.3.2 / Hive 2.3.3 / Hive 3.0.0
>            Reporter: Steve Jacobs
>            Assignee: Steve Loughran
>            Priority: Minor
>
> Deletes in S3A that use the Multi-Delete functionality in the Amazon S3 api 
> do not check to see if all objects have been succesfully delete. In the event 
> of a failure, the api will still return a 200 OK (which isn't checked 
> currently):
> [Delete Code from Hadoop 
> 2.8|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L574]
>  
> {code:java}
> if (keysToDelete.size() == MAX_ENTRIES_TO_DELETE) {
> DeleteObjectsRequest deleteRequest =
> new DeleteObjectsRequest(bucket).withKeys(keysToDelete);
> s3.deleteObjects(deleteRequest);
> statistics.incrementWriteOps(1);
> keysToDelete.clear();
> }
> {code}
> This should be converted to use the DeleteObjectsResult class from the 
> S3Client: 
> [Amazon Code 
> Example|https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.htm]
> {code:java}
> // Verify that the objects were deleted successfully.
> DeleteObjectsResult delObjRes = 
> s3Client.deleteObjects(multiObjectDeleteRequest); int successfulDeletes = 
> delObjRes.getDeletedObjects().size();
> System.out.println(successfulDeletes + " objects successfully deleted.");
> {code}
> Bucket policies can be misconfigured, and deletes will fail without warning 
> by S3A clients.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to