[
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758577#comment-16758577
]
Steve Loughran commented on HADOOP-16085:
-----------------------------------------
the enemy here is eventual consistency. Which is of course the whole reason
S3Guard was needed.
What issues are we worrying about
# mixed writer: some not going with s3guard, some doing. Even in nonauth mode,
I worry about delete tombstones.
# failure during large operations and so s3 not being in sync with the store.
# failure during a workflow with one or more GET calls on the second attempt
picking up the old version.
HADOOP-15625 is going to address the changes within an open file through etag
comparison, but without the etag being cached in the S3Guard repo, it's not
going to detect inconsistencies between the version expected and the version
read.
Personally, I'm kind of reluctant to rely on S3Guard for being the sole defence
against this problem.
bq. a re-run of a pipeline stage should always use a new output directory,
if you use the S3A committers for your work, and the default mode -insert a
guid into the filename- then filenames are always created unique. It becomes
impossible to get a RAW inconsistency. This is essentially where we are going,
along with Apache Iceberg (incubating). Rather than jump through
hoop-after-hoop of workarounds for S3s apparent decision to never deliver
consistent views, come up with data structures which only need one point of
consistency (you need to know the unique filename of the latest iceberg file).
Putting that aside, yes, keeping version markers would be good. I like etags
because they are exposed in getFileChecksum(); their flaw is that they can be
very large on massive MPUs (32bytes/block uploaded).
BTW, if you are worried about how observable is eventual consistency, generally
its delayed listings over actual content. There's a really good paper with
experimental data which does measure how often you can observe RAW
inconsistencies http://www.aifb.kit.edu/images/8/8d/Ic2e2014.pdf
> S3Guard: use object version to protect against inconsistent read after
> replace/overwrite
> ----------------------------------------------------------------------------------------
>
> Key: HADOOP-16085
> URL: https://issues.apache.org/jira/browse/HADOOP-16085
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.2.0
> Reporter: Ben Roling
> Priority: Major
> Attachments: HADOOP-16085_3.2.0_001.patch
>
>
> Currently S3Guard doesn't track S3 object versions. If a file is written in
> S3A with S3Guard and then subsequently overwritten, there is no protection
> against the next reader seeing the old version of the file instead of the new
> one.
> It seems like the S3Guard metadata could track the S3 object version. When a
> file is created or updated, the object version could be written to the
> S3Guard metadata. When a file is read, the read out of S3 could be performed
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my
> impression from looking through the code. My organization is looking to
> shift some datasets stored in HDFS over to S3 and is concerned about this
> potential issue as there are some cases in our codebase that would do an
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite
> track down any JIRAs discussing it. If there is one, feel free to close this
> with a reference to it.
> Am I understanding things correctly? Is this idea feasible? Any feedback
> that could be provided would be appreciated. We may consider crafting a
> patch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]