adoroszlai opened a new pull request, #6593: URL: https://github.com/apache/ozone/pull/6593
## What changes were proposed in this pull request? Create a CLI tool to find EC keys whose blocks are missing from some of the replicas due to having been reconstructed before HDDS-10681 was fixed. The tool prints keys affected in a tab-separated list, showing key name, size, replication. This information can be used to recreate the keys (if enough replicas are still available) using existing tools (`ozone sh key get/put`). https://issues.apache.org/jira/browse/HDDS-10751 ## How was this patch tested? Configured Ozone's `restart` Docker Compose environment with some additional datanodes. Set 8MB block size, so EC 3-2 would use 1 block on each datanode hosting the EC container for each 24MB of total key size. Created a set of EC keys spread out in 4 containers: - container 1 had a key of 1MB, 25MB, 49MB, 73MB each (thus 2 padding blocks) - container 2 had a key of 2MB, 26MB, 50MB, 74MB each (1 padding block) - container 3 had a key of 3MB, 27MB, 51MB, 75MB each (no padding) - container 4 had a key of 4MB, 28MB, 52MB, 76MB each (no padding) Reproduced HDDS-10681 by stopping nodes hosting replica 2 of container 1 and replica 3 of container 2, and letting Ozone reconstruct all containers. ``` $ ozone debug fmp Key Size Replication vol1/bucket2/50mb 52428800 rs-3-2-1024k vol1/bucket1/1mb 1048576 rs-3-2-1024k vol1/bucket2/26mb 27262976 rs-3-2-1024k vol1/bucket2/2mb 2097152 rs-3-2-1024k vol1/bucket1/73mb 76546048 rs-3-2-1024k vol1/bucket2/74mb 77594624 rs-3-2-1024k vol1/bucket1/49mb 51380224 rs-3-2-1024k vol1/bucket1/25mb 26214400 rs-3-2-1024k ``` By setting `OZONE_LOGLEVEL`, additional details are logged about the process: ``` $ OZONE_LOGLEVEL=INFO ozone debug fmp ... [main] INFO shell.Handler: Found 4 blocks missing from container 1 on replica 2 at a9c3b585-23bc-4180-85e4-0180cb1e6054(restart_dn7_1.restart_net/10.9.0.17) [main] INFO shell.Handler: Found 4 blocks missing from container 2 on replica 3 at a9c3b585-23bc-4180-85e4-0180cb1e6054(restart_dn7_1.restart_net/10.9.0.17) ... ``` (To filter unrelated log messages, set level specifically for `log4j.logger.org.apache.hadoop.ozone.shell` in `log4j.properties`, and use `OZONE_LOGLEVEL=WARN` to enable logging for the CLI command.) Keys recreated using get/put are no longer reported: ``` $ ozone debug fmp Key Size Replication vol1/bucket2/50mb 52428800 rs-3-2-1024k vol1/bucket2/2mb 2097152 rs-3-2-1024k vol1/bucket2/74mb 77594624 rs-3-2-1024k vol1/bucket2/26mb 27262976 rs-3-2-1024k ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
