bgaborg commented on a change in pull request #1691: HADOOP-16424. S3Guard
fsck: Check internal consistency of the MetadataStore
URL: https://github.com/apache/hadoop/pull/1691#discussion_r348534984
##########
File path:
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardFsck.java
##########
@@ -396,6 +409,197 @@ public Path getPath() {
}
}
+ /**
+ * Tasks to do here:
+ *
+ * - find orphan entries (entries without a parent)
+ * - find if a file's parent is not a directory (so the parent is a file)
+ * - warn: no lastUpdated field
+ * - entries where the parent is a tombstone
+ * @param ddbms the metadatastore to check
+ */
+ public void checkDdbInternalConsistency (DynamoDBMetadataStore ddbms,
+ Path rootPath) throws IOException {
+
+ Preconditions.checkArgument(rootPath.isAbsolute(), "path must be
absolute");
+
+ String rootStr = rootPath.toString();
+ LOG.info("Rootstr: {}", rootStr);
+
+ final Table table = ddbms.getTable();
+ DDBTree ddbTree = new DDBTree();
+
+ /**
+ * I. Root node construction
+ * - If the root node is the real bucket root, a node is constructed
instead of
+ * doing a query to the ddb because the bucket root is not stored.
+ * - If the root node is not a real bucket root then the entry is queried
from
+ * the ddb and constructed from the result.
+ */
+
+ DDBPathMetadata rootMeta;
+
+ if (!rootPath.isRoot()) {
+ PrimaryKey rootKey = pathToKey(rootPath);
+ final GetItemSpec spec = new GetItemSpec()
+ .withPrimaryKey(rootKey)
+ .withConsistentRead(true);
+ final Item rootItem = table.getItem(spec);
+ rootMeta = itemToPathMetadata(rootItem, ddbms.getUsername());
+ } else {
+ rootMeta = new DDBPathMetadata(
+ new S3AFileStatus(Tristate.UNKNOWN, rootPath, ddbms.getUsername())
+ );
+ }
+
+ DDBTreeNode root = new DDBTreeNode(rootMeta);
+ ddbTree.addNode(root);
+ ddbTree.setRoot(root);
+
+ /**
+ * II. Build the descendant tree:
+ * 1. query all nodes where the parent is our root, and put it in the tree
+ * 2. Add edges to the tree: connect each node with the parent.
+ * - This should be done in O(n): we only need to find the parent based
on the
+ * path with a hashmap lookup.
+ *
+ * 3. Do a test if the graph is connected - find orphan entries
+ *
+ * 4. Do test the elements for errors:
+ * - File is a parent of a file.
+ * - Entries where the parent is tombstoned but the entries are not.
+ * - Warn on no lastUpdated field.
+ */
+ ExpressionSpecBuilder builder = new ExpressionSpecBuilder();
+ builder.withCondition(
+ ExpressionSpecBuilder.S("parent")
+ .beginsWith(pathToParentKey(rootPath))
+ );
+ final IteratorSupport<Item, ScanOutcome> resultIterator = table.scan(
Review comment:
I'll use stopwatch as in the other check. Staying consistent.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]