[ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771969#comment-16771969
 ] 

Ben Roling commented on HADOOP-15625:
-------------------------------------

bq. although I wouldn't expect it to be seen so often as to be offensive to 
users of such third-party stores (assuming such stores actually exist).

This is sort of embarrassing.  I don't know what exactly I was thinking when I 
wrote that.  If GetObject never returns an eTag for some third-party store and 
we logged a warning when that happened then if you used that third-party store 
you'd see a warning on every single file read.  Obviously that would look 
stupid.

It does feel like we will need some form of configuration if we're worried 
about third-party stores not supporting eTags (such as not returning them on 
GetObject or not supporting withMatchingETagConstraint()).  I'll just go ahead 
and add some configuration around this in my next version of the patch.  I'm 
still waiting on the feedback about the Exception type though.

> S3A input stream to use etags to detect changed source files
> ------------------------------------------------------------
>
>                 Key: HADOOP-15625
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15625
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>            Priority: Major
>         Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch, 
> HADOOP-15625-003.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to