[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019741#comment-16019741
 ] 

Steve Loughran commented on HADOOP-13345:
-----------------------------------------

This s a read pipeline. What I think has happened is the client did open(), and 
s3guard skipped the existence check as ddb said it was there (and how long it 
was). The HTTP stream isn't set up in open(); it relies on the HEAD to have 
done the check first (a getFileStatus() is called to verify the path isn't a 
dir; if the path isn't there it fails. (note we could do a simpler check 
without the LIST call in the dir scan).

Because with s3Guard the HEAD request is skipped, it's only on the first seek 
that an attempt is made to GET the file contents. No file, error. There's 
nothing wrong with that per-se, it just means that if s3guard is inconsistent 
with the store, things show up later.

1.  could this be reported? e.g when an FNFE is raised when opening  a stream 
on a s3guarded bucket, warn use this may be an inconsistency.
2. S3AInputStream relies on the file length being normative {see 
{{calculateRequestLimit}}). If DDB thinks there is less data than there is, the 
extra data isn't picked up. You won't be able to seek past the amount of data 
that s3guard thinks is in the file, even if there is now more

We may want to have s3guard in non-auth mode do the HEAD on the final entry for 
that failfast and to get the length. (side topic: if we do that, and note the 
length is different, what to do in s3guard itself?). (This could be done in s3a 
input stream, as it if fadvise=normal it could start with a full GET of the 
file & pick up content-length there. Its for the seek-optimised random IO that 
we'd want to postpone the GET until the first readFully(), and limit its length 
to something shorter

> S3Guard: Improved Consistency for S3A
> -------------------------------------
>
>                 Key: HADOOP-13345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13345
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to