[ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790971#comment-16790971
 ] 

Ben Roling commented on HADOOP-15625:
-------------------------------------

I also went ahead and ran the integration test suite against the bucket with 
object versioning enabled and fs.s3a.change.detection.source=versionid just as 
one additional scenario.  The tests passed just as they had with the default 
configuration.  Furthermore, ITestS3ARemoteFileChanged successfully tested all 
the permutations of fs.s3a.change.detection configuration.

Is there anything more I should do from an integration testing perspective?

> S3A input stream to use etags/version number to detect changed source files
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-15625
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15625
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Ben Roling
>            Priority: Major
>         Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch, HADOOP-15625-014.patch, HADOOP-15625-015.patch, 
> HADOOP-15625-015.patch, HADOOP-15625-016.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to