[ 
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763676#comment-16763676
 ] 

Ben Roling commented on HADOOP-16085:
-------------------------------------

Thanks for the feedback [[email protected]].

With respect to the S3AFileSystem.getFileStatus() change I should have been a 
bit clearer.  I changed only the method signature, not the real type being 
returned.  S3AFileSystem.getFileStatus() is just a wrapper over 
innerGetFileStatus() which was already returning S3AFileStatus.  As such, it 
doesn't seem to me that it should have introduced any new serialization 
concerns, right?  I'll avoid the method signature change though and use casts 
where necessary instead.
{quote}IMO, failing because a file has been overwritten is fine, but ideally it 
should fail with a meaningful error, not EOF
{quote}
Fair point.  I was thinking to ask for feedback on the exception type in this 
scenario anyway but failed to do so with my last comment.  I chose EOFException 
to match the current behavior in the seek() after overwrite scenario and 
because I was having trouble choosing a better exception type.  I thought about 
possibly FileNotFoundException, but that didn't really feel right as the file 
does still exist.  I was thinking something more like 
ConcurrentModificationException, but that's more Java Collections oriented and 
not an IOException.  I wondered if there was an IOException similar to that 
defined somewhere but couldn't find one.  Another option I considered was 
creating a new IOException type within the S3A package.  I browsed other 
available IOException types and didn't see a good fit.  Did you have any 
specific suggestions?
{quote}One of the committer tests is going to have to be extended for this
{quote}
Ah, yeah, I'll need to dig deeper on that subject to better understand how 
those work and the updates that would be needed.
 # 
{quote}how about we start with HADOOP-15625 to make the input stream use etag 
to detect failures in a file, with tests to create those conditions{quote}

Sure, that sounds reasonable.  I'll create a patch for that.

> S3Guard: use object version to protect against inconsistent read after 
> replace/overwrite
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16085
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16085
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Ben Roling
>            Priority: Major
>         Attachments: HADOOP-16085_002.patch, HADOOP-16085_3.2.0_001.patch
>
>
> Currently S3Guard doesn't track S3 object versions.  If a file is written in 
> S3A with S3Guard and then subsequently overwritten, there is no protection 
> against the next reader seeing the old version of the file instead of the new 
> one.
> It seems like the S3Guard metadata could track the S3 object version.  When a 
> file is created or updated, the object version could be written to the 
> S3Guard metadata.  When a file is read, the read out of S3 could be performed 
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my 
> impression from looking through the code.  My organization is looking to 
> shift some datasets stored in HDFS over to S3 and is concerned about this 
> potential issue as there are some cases in our codebase that would do an 
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite 
> track down any JIRAs discussing it.  If there is one, feel free to close this 
> with a reference to it.
> Am I understanding things correctly?  Is this idea feasible?  Any feedback 
> that could be provided would be appreciated.  We may consider crafting a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to