[
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019741#comment-16019741
]
Steve Loughran commented on HADOOP-13345:
-----------------------------------------
This s a read pipeline. What I think has happened is the client did open(), and
s3guard skipped the existence check as ddb said it was there (and how long it
was). The HTTP stream isn't set up in open(); it relies on the HEAD to have
done the check first (a getFileStatus() is called to verify the path isn't a
dir; if the path isn't there it fails. (note we could do a simpler check
without the LIST call in the dir scan).
Because with s3Guard the HEAD request is skipped, it's only on the first seek
that an attempt is made to GET the file contents. No file, error. There's
nothing wrong with that per-se, it just means that if s3guard is inconsistent
with the store, things show up later.
1. could this be reported? e.g when an FNFE is raised when opening a stream
on a s3guarded bucket, warn use this may be an inconsistency.
2. S3AInputStream relies on the file length being normative {see
{{calculateRequestLimit}}). If DDB thinks there is less data than there is, the
extra data isn't picked up. You won't be able to seek past the amount of data
that s3guard thinks is in the file, even if there is now more
We may want to have s3guard in non-auth mode do the HEAD on the final entry for
that failfast and to get the length. (side topic: if we do that, and note the
length is different, what to do in s3guard itself?). (This could be done in s3a
input stream, as it if fadvise=normal it could start with a full GET of the
file & pick up content-length there. Its for the seek-optimised random IO that
we'd want to postpone the GET until the first readFully(), and limit its length
to something shorter
> S3Guard: Improved Consistency for S3A
> -------------------------------------
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/s3
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch,
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf,
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a
> stronger consistency model than what is currently offered. The solution
> coordinates with a strongly consistent external store to resolve
> inconsistencies caused by the S3 eventual consistency model.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]