[ https://issues.apache.org/jira/browse/HADOOP-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035211#comment-16035211 ]
Aaron Fabbri commented on HADOOP-14468: --------------------------------------- I created this JIRA to follow up on [your comment|https://issues.apache.org/jira/browse/HADOOP-13345?focusedCommentId=16019741&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16019741] and the discussion about failing fast when file is not visible in S3 in the read path. I'm not 100% convinced we want this but it could be useful for: 1. Failing fast on open() instead of when we later read the stream. 2. A "safe mode" or fallback that can be enabled. When this is set to false, we could collect stats on any time MetadataStore differs from S3 which would be interesting. I.e. "s3 / metastore length differs" or "visible in metastore but not s3" In general we do not support a mixed mode where some clients use S3Guard and others do not: It is not safe. However, if there is a well-known path where only an external process (e.g. ETL) is dropping files for ingest, it may be nice to be able to support that more narrow case. I think the existing behavior with list checking S3 + MetadataStore is sufficient without this change though. > S3Guard: make short-circuit getFileStatus() configurable > -------------------------------------------------------- > > Key: HADOOP-14468 > URL: https://issues.apache.org/jira/browse/HADOOP-14468 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Reporter: Aaron Fabbri > Assignee: Aaron Fabbri > > Currently, when S3Guard is enabled, getFileStatus() will skip S3 if it gets a > result from the MetadataStore (e.g. dynamodb) first. > I would like to add a new parameter > {{fs.s3a.metadatastore.getfilestatus.authoritative}} which, when true, keeps > the current behavior. When false, S3AFileSystem will check both S3 and the > MetadataStore. > I'm not sure yet if we want to have this behavior the same for all callers of > getFileStatus(), or if we only want to check both S3 and MetadataStore for > some internal callers such as open(). -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org