[
https://issues.apache.org/jira/browse/HADOOP-16433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886294#comment-16886294
]
Gabor Bota commented on HADOOP-16433:
-------------------------------------
Old deprecated description from HADOOP-16380:
{quote}If S3AFileSystem does an S3 LIST restricted to a single object to see if
a directory is empty, and the single entry found has a tombstone marker (either
from an inconsistent DDB Table or from an eventually consistent LIST) then it
will consider the directory empty, _even if there is 1+ entry which is not
deleted_
We need to make sure the calculation of whether a directory is empty or not is
resilient to this, efficiently.
It surfaces as an issue two places
* delete(path) (where it may make things worse)
* rename(src, dest), where a check is made for dest != an empty
directory.{quote}
> Filter expired entries and tombstones when listing with
> MetadataStore#listChildren
> ----------------------------------------------------------------------------------
>
> Key: HADOOP-16433
> URL: https://issues.apache.org/jira/browse/HADOOP-16433
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.3.0
> Reporter: Gabor Bota
> Assignee: Gabor Bota
> Priority: Major
>
> Currently, we don't filter out entries in {{listChildren}} implementations.
> This can cause bugs and inconsistencies, so this should be fixed.
> It can lead to a status where we can't recover from the following:
> {{guarded and raw (OOB op) clients are doing ops to S3}}
> {noformat}
> Guarded: touch /AAAA
> Guarded: touch /ZZZZ
> Guarded: rm /AAAA {{-> tombstone in MS}}
> RAW: touch /AAAA/file.ext {{-> file is hidden with a tombstone}}
> Guarded: ls / {{-> the directory is empty}}
> {noformat}
> After we change the following code
> {code:java}
> final List<PathMetadata> metas = new ArrayList<>();
> for (Item item : items) {
> DDBPathMetadata meta = itemToPathMetadata(item, username);
> metas.add(meta);
> }
> {code}
> to
> {code:java}
> // handle expiry - only add not expired entries to listing.
> if (meta.getLastUpdated() == 0 ||
> !meta.isExpired(ttlTimeProvider.getMetadataTtl(),
> ttlTimeProvider.getNow())) {
> metas.add(meta);
> }
> {code}
> we will filter out expired entries from the listing, so we can recover form
> these kind of OOB ops.
> Note: we have to handle the lastUpdated == 0 case, where the lastUpdated
> field is not filled in!
> Note: this can only be fixed cleanly after HADOOP-16383 is fixed because we
> need to have the TTLtimeProvider in MS to handle this internally.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]