[ 
https://issues.apache.org/jira/browse/HADOOP-16396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873692#comment-16873692
 ] 

Sean Mackrory commented on HADOOP-16396:
----------------------------------------

Attaching my current state, although I'm not done. My tests are failing because 
when I list a directory that we just listed a minute ago, it's still querying 
S3, even when authoritative mode should, in my understanding be kicking in. The 
problem would seem to be that the first listing doesn't perform a write-back, 
and sure-enough the metadata store never considers that directory listing 
authoritative. I ran all (not scale) the tests and traced to confirm that at no 
point do any of them write-back with authoritative=true. I thought we used to 
have logic in listings that would conditionally do a write back, and I assumed 
that recent work would have included flipping the authoritative bit. Am I 
missing something? [~gabor.bota] [[email protected]]

> Allow authoritative mode on a subdirectory
> ------------------------------------------
>
>                 Key: HADOOP-16396
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16396
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>            Priority: Major
>         Attachments: HADOOP-16396.001.patch
>
>
> Let's allow authoritative mode to be applied only to a subset of a bucket. 
> This is coming primarily from a Hive warehousing use-case where Hive-managed 
> tables can benefit from query planning, but can't speak for the rest of the 
> bucket. This should be limited in scope and is not a general attempt to allow 
> configuration on a per-path basis, as configuration is currently done on a 
> per-process of a per-bucket basis.
> I propose a new property (we could overload 
> fs.s3a.metadatastore.authoritative, but that seems likely to cause confusion 
> somewhere). A string would be allowed that would then be qualified in the 
> context of the FileSystem, and used to check if it is a prefix for a given 
> path. If it is, we act as though authoritative mode is enabled. If not, we 
> revert to the existing behavior of fs.s3a.metadatastore.authoritative (which 
> in practice will probably be false, the default, if the new property is in 
> use).
> Let's be clear about a few things:
> * Currently authoritative mode only short-cuts the process to avoid a 
> round-trip to S3 if we know it is safe to do so. This means that even when 
> authoritative mode is enabled for a bucket, if the metadata store does not 
> have a complete (or "authoritative") current listing cached, authoritative 
> mode still has no effect. This will still apply.
> * This will only apply to getFileStatus and listStatus, and internal calls to 
> their internal counterparts. No other API is currently using authoritative 
> mode to change behavior.
> * This will only apply to getFileStatus and listStatus calls INSIDE the 
> configured prefix. If there is a recursvie listing on the parent of the 
> configured prefix, no change in behavior will be observed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to