[ 
https://issues.apache.org/jira/browse/HADOOP-16221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830188#comment-16830188
 ] 

Hudson commented on HADOOP-16221:
---------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16480 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16480/])
HADOOP-16221. S3Guard: add option to fail operation on metadata write (stevel: 
rev 0af4011580878566213016af0c32633eabd15100)
* (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/MetadataPersistenceException.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AMetadataPersistenceException.java
* (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Retries.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ARetryPolicy.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java


> S3Guard: fail write that doesn't update metadata store
> ------------------------------------------------------
>
>                 Key: HADOOP-16221
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16221
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Ben Roling
>            Assignee: Ben Roling
>            Priority: Major
>             Fix For: 3.3.0
>
>
> Right now, a failure to write to the S3Guard metadata store (e.g. DynamoDB) 
> is [merely 
> logged|https://github.com/apache/hadoop/blob/rel/release-3.1.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2708-L2712].
>  It does not fail the S3AFileSystem write operation itself. As such, the 
> writer has no idea that anything went wrong. The implication of this is that 
> S3Guard doesn't always provide the consistency it advertises.
> For example [this 
> article|https://blog.cloudera.com/blog/2017/08/introducing-s3guard-s3-consistency-for-apache-hadoop/]
>  states:
> {quote}If a Hadoop S3A client creates or moves a file, and then a client 
> lists its directory, that file is now guaranteed to be included in the 
> listing.
> {quote}
> Unfortunately, this is sort of untrue and could result in exactly the sort of 
> problem S3Guard is supposed to avoid:
> {quote}Missing data that is silently dropped. Multi-step Hadoop jobs that 
> depend on output of previous jobs may silently omit some data. This omission 
> happens when a job chooses which files to consume based on a directory 
> listing, which may not include recently-written items.
> {quote}
> Imagine the typical multi-job Hadoop processing pipeline. Job 1 runs and 
> succeeds, but one (or more) S3Guard metadata write failed under the covers. 
> Job 2 picks up the output directory from Job 1 and runs its processing, 
> potentially seeing an inconsistent listing, silently missing some of the Job 
> 1 output files.
> S3Guard should at least provide a configuration option to fail if the 
> metadata write fails. It seems even ideally this should be the default?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to