[jira] [Commented] (HADOOP-16085) S3Guard: use object version to protect against inconsistent read after replace/overwrite

Sean Mackrory (JIRA) Fri, 01 Feb 2019 13:20:15 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758672#comment-16758672
 ]


Sean Mackrory commented on HADOOP-16085:
----------------------------------------

Thanks for submitting a patch [~ben.roling]. Haven't had a chance to do a full 
review yet, but one of [~fabbri]'s comments was also high on my list of things 
to watch out for:
{quote}Backward / forward compatible with existing S3Guarded buckets and Dynamo 
tables.{quote}
Specifically, we need to gracefully deal with any row missing an object 
version. The other direction is easy - if this simply adds a new field, old 
code will ignore it and we'll continue to get the current behavior.

My other concern is that this requires enabling object versioning. I know 
[~fabbri] has done some testing with that and I think eventually hit issues. 
Was it just a matter of the space all the versions were taking up, or was it 
actually a performance problem once there was enough overhead?

> S3Guard: use object version to protect against inconsistent read after 
> replace/overwrite
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16085
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16085
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Ben Roling
>            Priority: Major
>         Attachments: HADOOP-16085_3.2.0_001.patch
>
>
> Currently S3Guard doesn't track S3 object versions.  If a file is written in 
> S3A with S3Guard and then subsequently overwritten, there is no protection 
> against the next reader seeing the old version of the file instead of the new 
> one.
> It seems like the S3Guard metadata could track the S3 object version.  When a 
> file is created or updated, the object version could be written to the 
> S3Guard metadata.  When a file is read, the read out of S3 could be performed 
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my 
> impression from looking through the code.  My organization is looking to 
> shift some datasets stored in HDFS over to S3 and is concerned about this 
> potential issue as there are some cases in our codebase that would do an 
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite 
> track down any JIRAs discussing it.  If there is one, feel free to close this 
> with a reference to it.
> Am I understanding things correctly?  Is this idea feasible?  Any feedback 
> that could be provided would be appreciated.  We may consider crafting a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-16085) S3Guard: use object version to protect against inconsistent read after replace/overwrite

Reply via email to