[ 
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797173#comment-16797173
 ] 

Gabor Bota commented on HADOOP-16085:
-------------------------------------

Hi [~ben.roling]! Thanks for working on this.

h3. TestPathMetadataDynamoDBTranslation.java
I started to review the patch, and I found when we added a new field last time 
to dynamo we made tests in the {{TestPathMetadataDynamoDBTranslation}} to see 
if the field is ignored the implementation still works. Please add tests for 
ETAG and VERSION_ID - if these fields are ignored the 
{{PathMetadataDynamoDBTranslation}} will still function as expected. We could 
even create a parametrized test for this which takes the ignored lists of 
fields and tests if the fields are not there yet in ddb the translation will 
still work. We need to provide these tests to show that the improvement is 
backward compatible so no need to do any update manually.

h3. TestDirListingMetadata.java
Nit: in {{TestPathMetadataDynamoDBTranslation}} you defined TEST_ETAG and 
TEST_VERSION_ID. In TestDirListingMetadata you use string literals instead. 
Please define the constants here as well for readability and consistency.

h3. site/markdown/tools/hadoop-aws/s3guard.md
Please add the description of this feature to the docs. Please describe what 
kind of, and how inconsistencies are handled. The currently added part is ok, 
but I think it needs to be extended.

I will run another round on review and run some tests.

> S3Guard: use object version or etags to protect against inconsistent read 
> after replace/overwrite
> -------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16085
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16085
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Ben Roling
>            Priority: Major
>         Attachments: HADOOP-16085-003.patch, HADOOP-16085_002.patch, 
> HADOOP-16085_3.2.0_001.patch
>
>
> Currently S3Guard doesn't track S3 object versions.  If a file is written in 
> S3A with S3Guard and then subsequently overwritten, there is no protection 
> against the next reader seeing the old version of the file instead of the new 
> one.
> It seems like the S3Guard metadata could track the S3 object version.  When a 
> file is created or updated, the object version could be written to the 
> S3Guard metadata.  When a file is read, the read out of S3 could be performed 
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my 
> impression from looking through the code.  My organization is looking to 
> shift some datasets stored in HDFS over to S3 and is concerned about this 
> potential issue as there are some cases in our codebase that would do an 
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite 
> track down any JIRAs discussing it.  If there is one, feel free to close this 
> with a reference to it.
> Am I understanding things correctly?  Is this idea feasible?  Any feedback 
> that could be provided would be appreciated.  We may consider crafting a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to