[
https://issues.apache.org/jira/browse/HADOOP-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893332#comment-15893332
]
Aaron Fabbri commented on HADOOP-13914:
---------------------------------------
Thanks for the review [~mackrorysd].
{quote}
I'm a bit concerned about all the S3AFileStatus -> FileStatus changes (although
I could've sworn that we already removed that assertion in
PathMetadataTranslation...). I think it's definitely a change for the better,
but I'm a little worried there is some application out there that this change
will break, despite being @Private and @Evolving...
{quote}
Yes, this is a fundamental challenge for this patch. I explicitly want to
remove isEmptyDirectory from the public API. It should only be used internally
because we don't want the cost of always computing it, when it is usually never
used.
{quote}
Along similar lines, I wondered if a `public S3AFileStatus getFileStatus(Path,
bool)` function might be in order in S3AFileSystem? No idea how much it'll get
used, but if anyone IS depending on the current S3AFileStatus return type and
wants that work done it'd be useful.
{quote}
My understanding is that having isEmptyDirectory() on the S3AFileStatus is an
internal optimization to save a round trip for delete and rename. Without a
compelling need for this shortcut in the public API, I would suggest people
just use {{listStatus(dir)}} to determine emptiness.
{quote}
There's an assertion in TestDynamoDBMetadataStore.java that isEmptyDirectory()
is what we expect. Why is that removed? Is it a Boolean -> Tristate issue? If
so, shouldn't we modify the logic to accept either the expected one or UNKNOWN?
{quote}
After this patch, MetadataStore implementations are not required to return
S3AFileStatus with isEmptyDirectory set properly (which was broken anyways for
DDB). So that assertion no longer makes sense. I should probably:
- Replace that assert with one that checks PathMetadata#isEmptyDirectory() does
not conflict with expected value (i.e. UNKNOWN is always correct, but false
versus true would be a failure).
- We should probably have some new MetadataStore contract test cases around
{{PathMetadata#isEmptyDirectory}}. I can add one in the next patch.
> s3guard: improve S3AFileStatus#isEmptyDirectory handling
> --------------------------------------------------------
>
> Key: HADOOP-13914
> URL: https://issues.apache.org/jira/browse/HADOOP-13914
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: HADOOP-13345
> Reporter: Aaron Fabbri
> Assignee: Aaron Fabbri
> Attachments: HADOOP-13914-HADOOP-13345.000.patch,
> HADOOP-13914-HADOOP-13345.002.patch, HADOOP-13914-HADOOP-13345.003.patch,
> s3guard-empty-dirs.md, test-only-HADOOP-13914.patch
>
>
> As discussed in HADOOP-13449, proper support for the isEmptyDirectory() flag
> stored in S3AFileStatus is missing from DynamoDBMetadataStore.
> The approach taken by LocalMetadataStore is not suitable for the DynamoDB
> implementation, and also sacrifices good code separation to minimize
> S3AFileSystem changes pre-merge to trunk.
> I will attach a design doc that attempts to clearly explain the problem and
> preferred solution. I suggest we do this work after merging the HADOOP-13345
> branch to trunk, but am open to suggestions.
> I can also attach a patch of a integration test that exercises the missing
> case and demonstrates a failure with DynamoDBMetadataStore.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]