[ 
https://issues.apache.org/jira/browse/HADOOP-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603849#comment-15603849
 ] 

Aaron Fabbri commented on HADOOP-13756:
---------------------------------------

Hi [~eddyxu].. Thanks for putting together this good description.  I've been 
meaning to rewrite part of LocalMetadataStore for the reason you outline here.  
(Tests pass because clients fall back to the backing store when 
get(PathMetadata) returns null.  Also getFileStatus() calls and file creations 
cause much of the PathMetadata to be recorded.)

Two issues here

(1) LocalMetadataStore implementation
(2) Design of Interface: Is DirListingMetadata required?

#1. I need to rework the datastructures here.  Keeping two copies of each 
FileStatus is silly.  "two hashtables" was a quick prototype that needs to be 
replaced.  Callers of MetadataStore interface do not have to do separate put() 
for each child in a directory; those FileStatuses were included in the 
put(DirListingMetadata).

#2 Do we need the "batched" API of put(DirListingMetadata)?  Here was the 
thought process so far:

You can think of DirListingMetadata as "results of listStatus() plus an 
authoritative bit".

I thought about removing DirListingMetadata and just doing put()/get() on 
PathMetadata for each directory entry.  Then we need a separate 
setAuthoritative(path, boolean) function.  Does this open up new race 
conditions?

If Client A is putting the results of a listStatus() into MetadataStore, one by 
one, then calling setAuthoritative(parent), while Client B is putting or 
deleting entries into the same directory, maybe there is no race there.  Maybe 
we think of your proposed setAuthoritative(path, boolean) function as a marker 
in time, after which, the MetadataStore knows the full contents of the 
directory, instead of put(DirListingMeta, authoritative=true) as "this is the 
current snapshot of the full directory contents".

If we are implementing directory-level cache invalidation (probably necessary 
for S3AFileStatus#isEmptyDirectory(), and maybe as CLI operation), it could be 
a little tricky.  If Client A is doing its sequence {set(child_meta_1), 
set(child_meta_2), ..., setAuthoritative(parent_path, true)} and Client B needs 
to invalidate the parent directory in the middle of that stream, I'm not sure 
how that would work.  The DirListingMetadata approach at least makes it 
possible for implementations to handle it, even though many (dynamoDB) will 
likely not handle that case.

For #1, I will fix the LocalMetadataStore and add tests to catch this sort of 
case.

For #2, I'd prefer to keep this interface until we get the major patches merged 
(HADOOP-13631, HADOOP-13651, and HADOOP-13449) and then do a followup JIRA for 
any interface changes.  I'm open to suggestions though, what do you think?

> LocalMetadataStore#put(DirListingMetadata) should also put file metadata into 
> fileHash.
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13756
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13756
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Lei (Eddy) Xu
>
> {{LocalMetadataStore#put(DirListingMetadata)}} only puts the metadata into 
> {{dirHash}}, thus all {{FileStatus}} s are missing from 
> {{LocalMedataStore#fileHash()}}, which makes it confuse to use.
> So in the current way, to correctly put file status into the store (and also 
> set {{authoriative}} flag), you need to run  {code}
> List<PathMetadata> metas = new ArrayList<PathMetadata>();
> boolean authorizative = true;
> for (S3AFileStatus status : files) {
>    PathMetadata meta = new PathMetadata(status);
>    store.put(meta);
> }
> DirListingMetadata dirMeta = new DirMeta(parent, metas, authorizative);
> store.put(dirMeta);
> {code}
> Since solely calling {{store.put(dirMeta)}} is not correct, and calling 
> {{store.put(dirMeta);}} after putting all sub-file {{FileStatuss}} does the 
> repetitive jobs. Can we just use a {{put(PathMetadata)}} and a 
> {{get/setAuthorative()}}   in the MetadataStore interface instead?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to