Steve Loughran created HADOOP-15625:
---------------------------------------
Summary: S3A input stream to use etags to detect changed source
files
Key: HADOOP-15625
URL: https://issues.apache.org/jira/browse/HADOOP-15625
Project: Hadoop Common
Issue Type: Sub-task
Components: fs/s3
Affects Versions: 3.2.0
Reporter: Brahma Reddy Battula
S3A input stream doesn't handle changing source files any better than the other
cloud store connectors. Specifically: it doesn't noticed it has changed, caches
the length from startup, and whenever a seek triggers a new GET, you may get
one of: old data, new data, and even perhaps go from new data to old data due
to eventual consistency.
We can't do anything to stop this, but we could detect changes by
# caching the etag of the first HEAD/GET (we don't get that HEAD on open with
S3Guard, BTW)
# on future GET requests, verify the etag of the response
# raise an IOE if the remote file changed during the read.
It's a more dramatic failure, but it stops changes silently corrupting things.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]