[
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758432#comment-16758432
]
Ben Roling commented on HADOOP-16085:
-------------------------------------
Thanks for the thoughts [~fabbri]. I have an initial patch that I will upload
soon. The patch stores object versionId and as such only provides
read-after-overwrite protection if object versioning is enabled. If object
versioning is not enabled on the bucket, things would function the same as
before.
I hadn't really considered storing eTags instead. I'll look at the feasibility
of doing that as it could remove the dependency on enabling object versioning
to make the feature more broadly applicable. I think my organization is likely
to enable object versioning anyway, but if S3Guard doesn't depend on it then
more folks may benefit.
Thanks for your list of considerations. Here are some responses:
* The feature is always enabled and adds zero round trips. versionId was
available already on the PutObject response so I'm just capturing it and
storing it. This is the way it is in my patch anyway. I'm curious for
feedback if you believe there should be a capability to toggle the feature on
or off?
* There isn't really a conflict resolution policy. If we have versionId in
the metadata, we provide it on the GetObject request. We either get back what
we are looking for or a 404. I'm guessing 404s are not going to happen (except
if the object is deleted outside the context of S3Guard, but that's outside the
scope of this). I assume read-after-overwrite inconsistencies in general with
S3 happen due to cache hits on the old version, but when the version (eTag or
versionId) is explicitly specified there should be no cache hit and we would
get the same read-after-write consistency as you get on an initial PUT (no
overwrite). Even if I am wrong, the worst case is you get a
FileNotFoundException, which is much better than an inconsistent read and no
error. Retries could be added on 404, but maybe wait until it is proven they
are necessary.
* I'm not trying to protect against a racing writer issue. I can add
something to the documentation about it.
* The changes are backward and forward compatible with existing buckets and
tables. The new versionId attribute is optional.
* MetadataStore expiry should be fine. The versionId is optional. If it
isn't there, no problem. The only risk of inconsistent read-after-overwrite is
if the metadata is purged more quickly than S3 itself becomes
read-after-overwrite consistent for the object being read. I can update
documentation to mention this.
* I guess with regard to HADOOP-15779, there could be a new type of S3Guard
metadata inconsistency. If an object is overwritten outside of S3Guard,
S3Guard will not have the correct eTag or versionId and the reader may end up
seeing either a 404 or the old content. Prior to this, the reader would see
whatever content S3 returns on a GET that is not qualified by eTag or versionId.
> S3Guard: use object version to protect against inconsistent read after
> replace/overwrite
> ----------------------------------------------------------------------------------------
>
> Key: HADOOP-16085
> URL: https://issues.apache.org/jira/browse/HADOOP-16085
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.2.0
> Reporter: Ben Roling
> Priority: Major
>
> Currently S3Guard doesn't track S3 object versions. If a file is written in
> S3A with S3Guard and then subsequently overwritten, there is no protection
> against the next reader seeing the old version of the file instead of the new
> one.
> It seems like the S3Guard metadata could track the S3 object version. When a
> file is created or updated, the object version could be written to the
> S3Guard metadata. When a file is read, the read out of S3 could be performed
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my
> impression from looking through the code. My organization is looking to
> shift some datasets stored in HDFS over to S3 and is concerned about this
> potential issue as there are some cases in our codebase that would do an
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite
> track down any JIRAs discussing it. If there is one, feel free to close this
> with a reference to it.
> Am I understanding things correctly? Is this idea feasible? Any feedback
> that could be provided would be appreciated. We may consider crafting a
> patch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]