[
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567775#comment-16567775
]
Aaron Fabbri commented on HADOOP-14154:
---------------------------------------
Thanks for the v2 patch. The DDBPathMetadata stuff turned out pretty good.
Was thinking we could probably reduce garbage a bit with some additional
constructors for that class, but actually I think JVM escape analysis will
handle most of those extra allocations on the stack. The lists of metadata are
harder to deal with but I think your code looks fine here.
Some inline comments...
{noformat}
@@ -840,21 +861,36 @@ public void prune(long modTime, String keyPrefix) throws
IOException {
new ArrayList<>(S3GUARD_DDB_BATCH_WRITE_REQUEST_LIMIT);
int delay = conf.getInt(S3GUARD_DDB_BACKGROUND_SLEEP_MSEC_KEY,
S3GUARD_DDB_BACKGROUND_SLEEP_MSEC_DEFAULT);
+ Set<Path> parentPathSet = new HashSet<>();
for (Item item : expiredFiles(modTime, keyPrefix)) {
- PathMetadata md = PathMetadataDynamoDBTranslation
+ DDBPathMetadata md = PathMetadataDynamoDBTranslation
.itemToPathMetadata(item, username);
Path path = md.getFileStatus().getPath();
deletionBatch.add(path);
+
+ // add parent path of what we remove
+ Path parentPath = path.getParent();
+ parentPathSet.add(parentPath);
{noformat}
What if parentPath is root dir? I think we want a null check here.
{noformat}
+ private void removeAuthoritativeDirFlag(Set<Path> pathSet) {
+ Set<DDBPathMetadata> metas = pathSet.stream().map(path -> {
+ try {
+ DDBPathMetadata ddbPathMetadata = get(path);
+ if(ddbPathMetadata == null) {
+ return null;
+ }
+ LOG.debug("Setting false isAuthoritativeDir on {}", ddbPathMetadata);
+ ddbPathMetadata.setAuthoritativeDir(false);
+ return ddbPathMetadata;
+ } catch (IOException e) {
+ String msg = String.format("IOException while getting PathMetadata "
+ + "on path: %s.", path);
+ LOG.error(msg, e);
+ return null;
+ }
+ }).filter(Objects::nonNull).collect(Collectors.toSet());
{noformat}
I like that the stream keeps running if one of the paths fail, but should we
also save a reference to the IOException and then throw it at the end of the
function? That way we keep working even if we get a failure, but the failure
still gets propagated to the caller.
{noformat}
---
a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
+++
b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
@@ -727,6 +727,13 @@ public void testPruneUnsetsAuthoritative() throws
Exception {
new FileStatus(0, false, 0, 0, time + 1, strToPath(freshFile)),
Tristate.FALSE, false));
+ // set parent dir as authoritative
+ if (!allowMissing()) {
+ DirListingMetadata parentDirMd = ms.listChildren(strToPath(parentDir));
+ parentDirMd.setAuthoritative(true);
+ ms.put(parentDirMd);
+ }
{noformat}
Looks like you found a bug in the existing test case? Nice work.
I was wondering if we want an integration test to confirm forward and backward
compatibility with the schema change (old S3a works with new schema rows and
vice versa). Not sure how we'd implement that though. Would probably need a
separate copy of a couple of the PathMetadataDynamoDBTranslation functions with
the old logic in them (or better yet, a boolean flag to select old behavior w/o
the read/write of the auth flag), and then use those in the DDB integration
test to confirm it all works. I'm not sure what this buys us in terms of
regression testing though--so I could see the argument for manual testing. What
do you think?
> Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)
> -----------------------------------------------------------------------------
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.0.0-beta1
> Reporter: Rajesh Balamohan
> Assignee: Gabor Bota
> Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch,
> HADOOP-14154-HADOOP-13345.002.patch, HADOOP-14154-spec-001.pdf,
> HADOOP-14154-spec-002.pdf, HADOOP-14154.001.patch, HADOOP-14154.002.patch
>
>
> Add support for "authoritative mode" for DynamoDBMetadataStore.
> The missing feature is to persist the bit set in
> {{DirListingMetadata.isAuthoritative}}.
> This topic has been super confusing for folks so I will also file a
> documentation Jira to explain the design better.
> We may want to also rename the DirListingMetadata.isAuthoritative field to
> .isFullListing to eliminate the multiple uses and meanings of the word
> "authoritative".
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]