[
https://issues.apache.org/jira/browse/HADOOP-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623076#comment-15623076
]
Aaron Fabbri commented on HADOOP-13651:
---------------------------------------
Great comments, thank you.
{quote}
Wrap with a LOG.isDebugEnabled()
{quote}
Good call, will do. I'll roll another patch later today with this.
{quote}
The is empty dir logic here is getting ugly already. I think it may be best to
revisit that entire empty-dir logic
{quote}
Totally agree. Having a FileStatus which is cacheable (doesn't change based on
activity in other files) is sort of a prerequisite for playing nicely with the
MetadataStore division of labor. I can reason about the logic here, but the
isS3A flag in LocalMetadataStore is a layering violation, IMO.
My thought was to work on isEmptyDirectory as a separate effort after S3Guard
v1 is merged. I kept the isS3A stuff separated for that reason. I think we
may want to revisit the invariants around the empty directory blobs as well.
E.g. instead of "exists(empty directory blob) iff directory is empty" the
condition would be "exists(empty directory blob) implies there is a directory
at that path", which is only necessary if there are no other keys with a
matching prefix. I wonder if being lazier about cleaning up those blobs would
improve s3a perf, etc.
{quote}
If, instead of asking of the s3a status, it was just something which could be
queried off the metadata store, then it gets to implement the logic behind
S3Guard.isEmptyDirectory(metadatastore, s3afilestatus)
{quote}
MetadataStore may give one of three answers to this question:
(1) Yes, that path is an empty directory
(2) No, that path is not an empty directory
(3) I do not have full state for that directory, I cannot answer.
This could be used to do the right thing in the client, but it may take some
refactoring. Let me know how you want to tackle this part. I'd vote to defer
work on this part to a separate jira because we want to keep other parts of the
project moving/integrated, and i think this will be tricky enough to benefit
from a separate code review and discussion. The existing solution is bad
layering but keeps the semantics of isEmptyDirectory as is for S3A.
Also, on TODOs in code in the feature branch. Intent is to commit frequently so
folks can keep working in parallel, and that all TODOs will be addressed before
merging the feature branch.
> S3Guard: S3AFileSystem Integration with MetadataStore
> -----------------------------------------------------
>
> Key: HADOOP-13651
> URL: https://issues.apache.org/jira/browse/HADOOP-13651
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Aaron Fabbri
> Assignee: Aaron Fabbri
> Attachments: HADOOP-13651-HADOOP-13345.001.patch,
> HADOOP-13651-HADOOP-13345.002.patch, HADOOP-13651-HADOOP-13345.003.patch
>
>
> Modify S3AFileSystem et al. to optionally use a MetadataStore for metadata
> consistency and caching.
> Implementation should have minimal overhead when no MetadataStore is
> configured.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]