[ 
https://issues.apache.org/jira/browse/IMPALA-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732496#comment-16732496
 ] 

Steve Loughran commented on IMPALA-6544:
----------------------------------------

yes: S3A create file does a check to see if a file is there before creation

* if its a directory: fail fast
* if its a file and overwrite=false, falil

It's something we've discussed killing in the past as when we know 
overwrite=true, all we care about is whether its a directory or not: no need to 
HEAD the file.

The other thing is that with the newer createFile() API call, we can add an s3 
specific option to say "skip all the existence checks". A bit dangerous, but 
very fast. You had better know what you are doing The Flink team have asked for 
it already. 

* If you switch to using S3Guard, DynamoDB gives the consistency
* If you aren't using it, you have other consistency issues lurking

Looking @ the rest of the stack (traces are always interesting), put is doiing 
an upload to one path, then kicking off a rename; the renames need its own src 
and data checks. Eliminate that temp file (remember, PUT to an object store is 
the atomic operation you need), then that'll strip out most of that IO.



> Lack of S3 consistency leads to rare test failures
> --------------------------------------------------
>
>                 Key: IMPALA-6544
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6544
>             Project: IMPALA
>          Issue Type: Task
>          Components: Frontend
>    Affects Versions: Impala 2.8.0
>            Reporter: Sailesh Mukil
>            Priority: Major
>              Labels: S3, broken-build, consistency, flaky, test-framework
>
> Every now and then, we hit a flaky test on S3 runs due to files missing when 
> they should be present, and vice versa. We could consider running our tests 
> (or a subset of our tests) with S3Guard to avoid these problems, however rare 
> they are.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to