[jira] [Comment Edited] (HADOOP-16085) S3Guard: use object version to protect against inconsistent read after replace/overwrite

Ben Roling (JIRA) Fri, 01 Feb 2019 14:22:27 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758719#comment-16758719
 ]


Ben Roling edited comment on HADOOP-16085 at 2/1/19 10:21 PM:
--------------------------------------------------------------

Thanks for keeping the feedback coming!

 

[~ste...@apache.org]
{quote}if you use the S3A committers for your work, and the default mode 
-insert a guid into the filename- then filenames are always created unique.  It 
becomes impossible to get a RAW inconsistency. This is essentially where we are 
going, along with Apache Iceberg (incubating).
{quote}
Most of our processing is currently in Apache Crunch, for which the S3A 
committers don't really seem to apply at the moment.

I've seen the Apache Iceberg project and it does look quite interesting.  It's 
not practical for us to get everything to Iceberg before moving things to S3 
though.  We'll probably look at it closer in the future.
{quote}I like etags because they are exposed in getFileChecksum(); their flaw 
is that they can be very large on massive MPUs (32bytes/block uploaded).
{quote}
I'm not sure what you mean about getFileChecksum()?  I would expect to pull the 
etags from PutObjectResult.getETag() and 
CompleteMultipartUploadResult.getETag().  It doesn't seem necessary to me to 
track etag per block uploaded.  Is there something I am missing?
{quote}BTW, if you are worried about how observable is eventual consistency, 
generally its delayed listings over actual content. There's a really good paper 
with experimental data which does measure how often you can observe RAW 
inconsistencies [http://www.aifb.kit.edu/images/8/8d/Ic2e2014.pdf]
{quote}
Thanks for the reference.  I happened upon a link to that from 
[are-we-consistent-yet|https://github.com/gaul/are-we-consistent-yet] as well.  
I need to have a full read through it.

 

[~mackrorysd]
{quote}we need to gracefully deal with any row missing an object version. The 
other direction is easy - if this simply adds a new field, old code will ignore 
it and we'll continue to get the current behavior.
{quote}
I don't think this is too much of a problem.  I believe the code in my patch 
already handles it.

 
{quote}My other concern is that this requires enabling object versioning. I 
know [~fabbri] has done some testing with that and I think eventually hit 
issues. Was it just a matter of the space all the versions were taking up, or 
was it actually a performance problem once there was enough overhead?
{quote}
I'd like to hear more about this.  From a space perspective, the [S3 
documentation|https://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html]
 says object version can be up to 1024 characters but in my experience it looks 
like they are 32 (the same length as etag).  As I mentioned before, I'm looking 
at switching the patch over to use etag instead of object version anyway 
though.  I haven't gotten around to the code changes for it yet but it doesn't 
seem like it would be that much.  It's just a different field on 
PutObjectResult, CompleteMultipartUploadResult, and GetObjectRequest.

 

 


was (Author: ben.roling):
Thanks for keeping the feedback coming!

 

[~ste...@apache.org]
{quote}if you use the S3A committers for your work, and the default mode 
-insert a guid into the filename- then filenames are always created unique.  It 
becomes impossible to get a RAW inconsistency. This is essentially where we are 
going, along with Apache Iceberg (incubating).
{quote}
Most of our processing is currently in Apache Crunch, for which the S3A 
committers don't really seem to apply at the moment.

I've seen the Apache Iceberg project and it does look quite interesting.  It's 
not practical for us to get everything to Iceberg before moving things to S3 
through.  We'll probably look at it closer in the future.
{quote}I like etags because they are exposed in getFileChecksum(); their flaw 
is that they can be very large on massive MPUs (32bytes/block uploaded).
{quote}
I'm not sure what you mean about getFileChecksum()?  I would expect to pull the 
etags from PutObjectResult.getETag() and 
CompleteMultipartUploadResult.getETag().  It doesn't seem necessary to me to 
track etag per block uploaded.  Is there something I am missing?
{quote}BTW, if you are worried about how observable is eventual consistency, 
generally its delayed listings over actual content. There's a really good paper 
with experimental data which does measure how often you can observe RAW 
inconsistencies [http://www.aifb.kit.edu/images/8/8d/Ic2e2014.pdf]
{quote}
Thanks for the reference.  I happened upon a link to that from 
[are-we-consistent-yet|https://github.com/gaul/are-we-consistent-yet] as well.  
I need to have a full read through it.

 

[~mackrorysd]
{quote}we need to gracefully deal with any row missing an object version. The 
other direction is easy - if this simply adds a new field, old code will ignore 
it and we'll continue to get the current behavior.
{quote}
I don't think this is too much of a problem.  I believe the code in my patch 
already handles it.

 
{quote}My other concern is that this requires enabling object versioning. I 
know [~fabbri] has done some testing with that and I think eventually hit 
issues. Was it just a matter of the space all the versions were taking up, or 
was it actually a performance problem once there was enough overhead?
{quote}
I'd like to hear more about this.  From a space perspective, the [S3 
documentation|https://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html]
 says object version can be up to 1024 characters but in my experience it looks 
like they are 32 (the same length as etag).  As I mentioned before, I'm looking 
at switching the patch over to use etag instead of object version anyway 
though.  I haven't gotten around to the code changes for it yet but it doesn't 
seem like it would be that much.  It's just a different field on 
PutObjectResult, CompleteMultipartUploadResult, and GetObjectRequest.

 

 

> S3Guard: use object version to protect against inconsistent read after 
> replace/overwrite
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16085
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16085
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Ben Roling
>            Priority: Major
>         Attachments: HADOOP-16085_3.2.0_001.patch
>
>
> Currently S3Guard doesn't track S3 object versions.  If a file is written in 
> S3A with S3Guard and then subsequently overwritten, there is no protection 
> against the next reader seeing the old version of the file instead of the new 
> one.
> It seems like the S3Guard metadata could track the S3 object version.  When a 
> file is created or updated, the object version could be written to the 
> S3Guard metadata.  When a file is read, the read out of S3 could be performed 
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my 
> impression from looking through the code.  My organization is looking to 
> shift some datasets stored in HDFS over to S3 and is concerned about this 
> potential issue as there are some cases in our codebase that would do an 
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite 
> track down any JIRAs discussing it.  If there is one, feel free to close this 
> with a reference to it.
> Am I understanding things correctly?  Is this idea feasible?  Any feedback 
> that could be provided would be appreciated.  We may consider crafting a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-16085) S3Guard: use object version to protect against inconsistent read after replace/overwrite

Reply via email to