hadoop-yetus commented on a change in pull request #1003: HADOOP-16384: Avoid
inconsistencies between DDB and S3
URL: https://github.com/apache/hadoop/pull/1003#discussion_r301773319
##########
File path:
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
##########
@@ -1100,6 +1100,87 @@ property should be configured, and the name of that
table should be different
incurring AWS charges.
+### How to Dump the Table and Metastore State
+
+There's an unstable entry point to list the contents of a table
+and S3 filesystem ot a set of TSV files
+
+```
+hadoop org.apache.hadoop.fs.s3a.s3guard.DumpS3GuardTable s3a://bucket-x/
dir/out
+```
+
+This generates a set of files prefixed `dir/out-` with different views of the
worl.
+ which can then be viewed on the command line or editor:
+
+```
+"type" "deleted" "path" "is_auth_dir" "is_empty_dir" "len"
"updated" "updated_s" "last_modified" "last_modified_s" "etag"
"version"
+"file" "true"
"s3a://bucket/fork-0001/test/ITestS3AContractDistCp/testDirectWrite/remote"
"false" "UNKNOWN" 0 1562171244451 "Wed Jul 03 17:27:24 BST 2019"
1562171244451 "Wed Jul 03 17:27:24 BST 2019" "" ""
+"file" "true"
"s3a://bucket/Users/stevel/Projects/hadoop-trunk/hadoop-tools/hadoop-aws/target/test-dir/1/5xlPpalRwv/test/new/newdir/file1"
"false" "UNKNOWN" 0 1562171518435 "Wed Jul 03 17:31:58 BST
2019" 1562171518435 "Wed Jul 03 17:31:58 BST 2019" "" ""
+"file" "true"
"s3a://bucket/Users/stevel/Projects/hadoop-trunk/hadoop-tools/hadoop-aws/target/test-dir/1/5xlPpalRwv/test/new/newdir/subdir"
"false" "UNKNOWN" 0 1562171518535 "Wed Jul 03 17:31:58 BST
2019" 1562171518535 "Wed Jul 03 17:31:58 BST 2019" "" ""
+"file" "true" "s3a://bucket/test/DELAY_LISTING_ME/testMRJob" "false"
"UNKNOWN" 0 1562172036299 "Wed Jul 03 17:40:36 BST 2019"
1562172036299 "Wed Jul 03 17:40:36 BST 2019" "" ""
+```
+
+This is unstable: the output format may change without warning.
+To understand the meaning of the fields, consult the documentation.
+They are, currently:
+
+| field | meaning | source |
+|-------|---------| -------|
+| `type` | type | filestatus |
+| `deleted` | tombstone marker | metadata |
+| `path` | path of an entry | filestatus |
+| `is_auth_dir` | directory entry authoritative status | metadata |
+| `is_empty_dir` | does the entry represent an empty directory | metadata |
+| `len` | file length | filestatus |
+| `last_modified` | file status last modified | filestatus |
+| `last_modified_s` | file status last modified as string | filestatus |
+| `updated` | time (millis) metadata was updated | metadata |
+| `updated_s` | updated time as a string | metadata |
+| `etag` | any etag | filestatus |
+| `version` | any version| filestatus |
+
+Files generated
+
+| suffix | content |
+|---------------|---------|
+| `-scan.csv` | Full scan/dump of the metastore |
+| `-store.csv` | Recursive walk through the metastore |
+| `-tree.csv` | Treewalk through filesystem `listStatus("/")` calls |
+| `-flat.csv` | Flat listing through filesystem `listFiles("/", recursive)` |
+| `-s3.csv` | Dump of the S3 Store *only* |
+| `-scan-2.csv` | Scan of the store after the previous operations |
+
+Why the two scan entries? The S3Guard+S3 listing/treewalk operations
+may add new entries to the store.
Review comment:
whitespace:end of line
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]