[ 
https://issues.apache.org/jira/browse/HADOOP-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035211#comment-16035211
 ] 

Aaron Fabbri commented on HADOOP-14468:
---------------------------------------

I created this JIRA to follow up on [your 
comment|https://issues.apache.org/jira/browse/HADOOP-13345?focusedCommentId=16019741&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16019741]
  and the discussion about failing fast when file is not visible in S3 in the 
read path.

I'm not 100% convinced we want this but it could be useful for:

1. Failing fast on open() instead of when we later read the stream.
2. A "safe mode" or fallback that can be enabled.  When this is set to false, 
we could collect stats on any time MetadataStore differs from S3 which would be 
interesting.  I.e. "s3 / metastore length differs" or "visible in metastore but 
not s3"

In general we do not support a mixed mode where some clients use S3Guard and 
others do not: It is not safe.  However, if there is a well-known path where 
only an external process (e.g. ETL) is dropping files for ingest, it may be 
nice to be able to support that more narrow case.  I think the existing 
behavior with list checking S3 + MetadataStore is sufficient without this 
change though.

> S3Guard: make short-circuit getFileStatus() configurable
> --------------------------------------------------------
>
>                 Key: HADOOP-14468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14468
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>
> Currently, when S3Guard is enabled, getFileStatus() will skip S3 if it gets a 
> result from the MetadataStore (e.g. dynamodb) first.
> I would like to add a new parameter 
> {{fs.s3a.metadatastore.getfilestatus.authoritative}} which, when true, keeps 
> the current behavior.  When false, S3AFileSystem will check both S3 and the 
> MetadataStore.
> I'm not sure yet if we want to have this behavior the same for all callers of 
> getFileStatus(), or if we only want to check both S3 and MetadataStore for 
> some internal callers such as open().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to