[
https://issues.apache.org/jira/browse/HADOOP-16396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876116#comment-16876116
]
Gabor Bota commented on HADOOP-16396:
-------------------------------------
Thanks for working on this [~mackrorysd]! I'm ok with the refactor in the test.
The {{S3AFileSystem#allowAuthoritative}} method could go into {{S3Guard.java}}
instead of the FS because of two things: we try to avoid to add more content to
the fs file itself and factor out things from it because it's already huge and
just getting bigger. But if it looks like that it would be longer to call a
static method from S3Guard, then just go ahead and use this solution.
Minor naming thing: In {{ITestAuthoritativePath}} you use {{unguardedFS}}, but
in other tests we use rawFs (e.g in {{ITestS3GuardOutOfBandOperations}}). I
think these are the same, so we could use the same name (also for {{guardedFs}}
and {{fullyAuthFS}}. Or is there a difference between those?
Function placement: {{ITestAuthoritativePath#createUnguardedFS}} and
{{ITestS3GuardOutOfBandOperations#createUnguardedFS}} could be factored out as
a utility function to avoid code duplication. These two do the same. I almost
did this refactor with a previous improvement I was working on, but there were
no other usages of this function besides {{ITestS3GuardOutOfBandOperations}} so
I left it there. Now I think it's the time to do this. It's also true for
{{createGuardedFS}} and {{createFullyAuthFS}}.
nit: there's no usages for {{ITestAuthoritativePath#createNonAuthFS}}.
> Allow authoritative mode on a subdirectory
> ------------------------------------------
>
> Key: HADOOP-16396
> URL: https://issues.apache.org/jira/browse/HADOOP-16396
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/s3
> Reporter: Sean Mackrory
> Assignee: Sean Mackrory
> Priority: Major
> Attachments: HADOOP-16396.001.patch, HADOOP-16396.002.patch,
> HADOOP-16396.003.patch
>
>
> Let's allow authoritative mode to be applied only to a subset of a bucket.
> This is coming primarily from a Hive warehousing use-case where Hive-managed
> tables can benefit from query planning, but can't speak for the rest of the
> bucket. This should be limited in scope and is not a general attempt to allow
> configuration on a per-path basis, as configuration is currently done on a
> per-process of a per-bucket basis.
> I propose a new property (we could overload
> fs.s3a.metadatastore.authoritative, but that seems likely to cause confusion
> somewhere). A string would be allowed that would then be qualified in the
> context of the FileSystem, and used to check if it is a prefix for a given
> path. If it is, we act as though authoritative mode is enabled. If not, we
> revert to the existing behavior of fs.s3a.metadatastore.authoritative (which
> in practice will probably be false, the default, if the new property is in
> use).
> Let's be clear about a few things:
> * Currently authoritative mode only short-cuts the process to avoid a
> round-trip to S3 if we know it is safe to do so. This means that even when
> authoritative mode is enabled for a bucket, if the metadata store does not
> have a complete (or "authoritative") current listing cached, authoritative
> mode still has no effect. This will still apply.
> * This will only apply to getFileStatus and listStatus, and internal calls to
> their internal counterparts. No other API is currently using authoritative
> mode to change behavior.
> * This will only apply to getFileStatus and listStatus calls INSIDE the
> configured prefix. If there is a recursvie listing on the parent of the
> configured prefix, no change in behavior will be observed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]