[ 
https://issues.apache.org/jira/browse/HADOOP-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243827#comment-16243827
 ] 

Steve Loughran commented on HADOOP-15003:
-----------------------------------------

thanks for this, I'll look through the @after code too.
# All the other scale tests work in parallel, so I don't think it's their 
teardown, more something happening in a parallel test
# which can include a job commit happening on a different path
# as well as some test setup doing autopurge of old stuff
# test is getting filename inconsistent; not correctly working out path of 
pending file

The fact that you can see it and I can't makes me think it's either some race 
condition or config. The fact you can consistently see it means that it's 
unlikely to be a race, as that's going to to fail intermittently. I'd consider 
parallel tests purging buckets. But: how is that going to clean up the file 
under __magic? Lose the uncommitted MPUs, yes. 

Here's my plan
# make sure that the code logs @ info when a magic file is saved, some message 
like "file $dest... ready to commit, commit metadata stored at $pending."
# failing test case to add some asserts in the test about existence of pending 
files under a path. This will help differentiate purge of commit data from 
purge of files. Adding it in the test case which creates the file will verify 
that any delete/purge happens between the two
# add a test which explicitly deletes the MPU of a pending commit, see what 
happens
# add a test which explicitly runs two job commits in parallel to separate 
paths: verifies isolation.
# review troubleshooting docs. Once your list MPU CI
# Have all commands to purge pending MPUs actually list@info the files. After 
all, in normal execution there shouldn't be any. 
That CLI you've proposed for listing MPUS, "hadoop s3guard uploads" will help 
diagnose stuff in the field



> Merge S3A committers into trunk: Yetus patch checker
> ----------------------------------------------------
>
>                 Key: HADOOP-15003
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15003
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-041.patch, HADOOP-13786-042.patch, 
> HADOOP-13786-043.patch, HADOOP-13786-044.patch, HADOOP-13786-045.patch, 
> HADOOP-13786-046.patch
>
>
> This is a Yetus only JIRA created to have Yetus review the 
> HADOOP-13786/HADOOP-14971 patch as a .patch file, as the review PR 
> [https://github.com/apache/hadoop/pull/282] is stopping this happening in 
> HADOOP-14971.
> Reviews should go into the PR/other task



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to