[
https://issues.apache.org/jira/browse/HADOOP-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004155#comment-16004155
]
Aaron Fabbri commented on HADOOP-13760:
---------------------------------------
I did not finish a full review yet but I have a couple of comments to start...
{code}
/** Map of key to delay -> time it was created. */
- private Map<String, Long> delayedKeys = new HashMap<>();
+ private Map<String, Long> delayedPutKeys = new HashMap<>();
+
+ /** Map of key to delay -> time it was deleted. */
+ private Map<String, Long> delayedDeleteKeys = new HashMap<>();
+
+ /** Map of delayed deleted key -> object summary. */
+ private Map<String, S3ObjectSummary> delayedDeleteSummaries = new
HashMap<>();
+
+ /** Set of delayed prefixes (may include directories not in
+ * delayedDeletedSummaries */
+ private Set<String> delayedDeletePrefixes = new HashSet<>();
+
{code}
Curious if we can combine three deleted item maps to a single map holding
a struct instead...
{code}
+ private ObjectListing restoreListObjects(ListObjectsRequest request,
+ ObjectListing rawListing) {
+ List<S3ObjectSummary> outputList = rawListing.getObjectSummaries();
+ for (String key : new HashSet<>(delayedDeleteSummaries.keySet())) {
{code}
Why create a temporary HashSet just to iterate over keys?
{code}
+ if (isKeyDelayed(delayedDeleteKeys, key)) {
+ outputList.add(delayedDeleteSummaries.get(key));
{code}
(1) can this add duplicates? (2) How do you know this delayed item (e.g.
/a/b/c) belongs in this listing request (e.g. list /x/y/x)?
{code}
+ if (createOnS3) {
String key = pathToKey(f);
createFakeDirectory(key);
- S3Guard.makeDirsOrdered(metadataStore, metadataStoreDirs, username);
- deleteUnnecessaryFakeDirectories(f.getParent());
- return true;
}
+ S3Guard.makeDirsOrdered(metadataStore, metadataStoreDirs, username,
+ getConf().getBoolean(
+ Constants.METADATASTORE_AUTHORITATIVE,
+ Constants.DEFAULT_METADATASTORE_AUTHORITATIVE));;
{code}
I don't think we want this new argument added to {{makeDirsOrdered}}.
makeDirsOrdered
only operates on newly created directories. This means, by definition, we
know the full contents of all listed directories, since they were just created.
This is a common point of confusion. I should have named the two types of
"authoritative" separately:
1. DirListingMetadata.isAuthoritative(): True iff it contains the full listing
of the directory. (Maybe I should rename this isCompleteListing())
2. fs.s3a.metadatastore.authoriatative: S3A behavior: When true, S3A is
allowed
to skip round trips to S3 if the MetadataStore provides a full listing.
#2 generally should not need to be plumbed outside of S3AFileSystem.
{code}
- private S3AFileStatus s3GetFileStatus(final Path path, String key)
- throws IOException {
+ private S3AFileStatus s3GetFileStatus(final Path path, String key,
+ Set<Path> tombstones) throws IOException {
{code}
My first thought is: can we do this as a wrapper function? This is supposed to
be s3-level
logic but we're adding metadatastore tombstones to it. Let me think about this
a bit.
{code}
+ boolean isEmpty = true;
+ if (tombstones != null) {
+ for (String prefix : prefixes) {
+ Path absolute = qualify(new Path("/" + prefix));
+ if (!tombstones.contains(absolute)) {
+ isEmpty = false;
+ break;
+ }
+ }
+ if (isEmpty) {
+ for (S3ObjectSummary summary : summaries) {
+ Path absolute = qualify(new Path("/" + summary.getKey()));
+ if (!tombstones.contains(absolute)) {
+ isEmpty = false;
+ break;
+ }
+ }
+ }
+ }
+ return new S3AFileStatus(Tristate.fromBool(isEmpty), path, username);
{code}
Related: this would also be easier to unit test if separated into its own
function.
> S3Guard: add delete tracking
> ----------------------------
>
> Key: HADOOP-13760
> URL: https://issues.apache.org/jira/browse/HADOOP-13760
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Aaron Fabbri
> Assignee: Sean Mackrory
> Attachments: HADOOP-13760-HADOOP-13345.001.patch,
> HADOOP-13760-HADOOP-13345.002.patch, HADOOP-13760-HADOOP-13345.003.patch,
> HADOOP-13760-HADOOP-13345.004.patch, HADOOP-13760-HADOOP-13345.005.patch,
> HADOOP-13760-HADOOP-13345.006.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add
> delete tracking.
> Current behavior on delete is to remove the metadata from the MetadataStore.
> To make deletes consistent, we need to add a {{isDeleted}} flag to
> {{PathMetadata}} and check it when returning results from functions like
> {{getFileStatus()}} and {{listStatus()}}. In HADOOP-13651, I added TODO
> comments in most of the places these new conditions are needed. The work
> does not look too bad.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]