[ 
https://issues.apache.org/jira/browse/HADOOP-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606749#comment-15606749
 ] 

Lei (Eddy) Xu commented on HADOOP-13651:
----------------------------------------

Hi, [~fabbri]

Thanks for posting this missing and critical piece.  
Please see the reviews below:

* In {{S3AFIleSystem#innerDelete()}}.   The deletion on metadata store should 
happen before on S3. Similar to how HDFS handles deletion, delete namespace 
before deleting actual blocks. We can either 1) throw IOE if the metadata store 
deletion fails.  It can make sure the other clients have a consistent view 
between namespace and data.  In general, it'd be nice if we do S3 first then 
{{MS}} for creation, and do {{MS}} then {{S3}} for deletion.

* In {{getFileStatus()}}, 

{code}
    // Check MetadataStore, if any.
    PathMetadata pm = metadataStore.get(path);
    if (pm != null) {
      // TODO handle deleted files, i.e. PathMetadata#isDeleted()
      return (S3AFileStatus)pm.getFileStatus();
    }
{code}

It might be problematic for {{delete-after-listing}} case. Lets say the client 
A delete file "s3a://bucket/a/b" and client B tries to run {{getFileStatus()}} 
on this file.  And because the eventual consistency of S3:  when client B call 
this function, {{pm == null}}, but in the following code, B is still able to 
get the metadata from S3, thus client B will put {{s3a://bucket/a/b}} back to 
metadata store and returns it as it has not been deleted. 

Similar to the code in {{innerListStatus()}}:

{code}
 DirListingMetadata dirMeta = metadataStore.listChildren(path);
 if (allowAuthoritative && dirMeta != null && dirMeta.isAuthoritative()) {
     return S3Guard.dirMetaToStatuses(dirMeta);
}
{code}

So listStatus will go through this code if either {{allowAuthoritative == 
false}} or {{dirMeta == null}} or {{dirMeta.isAuthorizative == null}}. I don't 
fully understand what scenarios each condition represents. Can you explain a 
little bit how each case happens?

I think it is safer that after this point, just do retries to S3 (or both of 
them) until these two sets of files (from MS and S3) are equal.  Because in 
{{S3Guard.dirListingUnion()}}, {{s3guard}} overwrites the metadata store with 
the union of two sets and hides the fact that inconsistent results happened due 
to S3 eventual consistency.  

IIRC, {{EMRFS}} only updates dynamodb for file / directory creations. It uses 
CLI to load pre-existed data. EMRFS might have the point that for reads like 
{{getFileStatus()}} and {{listStatus()}}, it is too difficult for the 
filesystem to tell what is happening, and which source is wrong. 

* In {{finishedWrite()}}, you need also update the parent directory 
{{isEmptyDirectory()}} flag, similar to wha 
t{{deleteUnccessaryFakeDirectories(p.getParent())}} does.

Some nits:

* The checks for {{isNullMetadataStore()}} seems not necessary, as 
{{NullMetadataStore}} is supposed to do nothing. 
* The patch seems not be able to be cleanly patched to the latest HADOOP-13345. 

Thanks.



> S3Guard: S3AFileSystem Integration with MetadataStore
> -----------------------------------------------------
>
>                 Key: HADOOP-13651
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13651
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>         Attachments: HADOOP-13651-HADOOP-13345.001.patch, 
> HADOOP-13651-HADOOP-13345.002.patch
>
>
> Modify S3AFileSystem et al. to optionally use a MetadataStore for metadata 
> consistency and caching.
> Implementation should have minimal overhead when no MetadataStore is 
> configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to