adoroszlai opened a new pull request, #6593:
URL: https://github.com/apache/ozone/pull/6593

   ## What changes were proposed in this pull request?
   
   Create a CLI tool to find EC keys whose blocks are missing from some of the 
replicas due to having been reconstructed before HDDS-10681 was fixed.
   
   The tool prints keys affected in a tab-separated list, showing key name, 
size, replication.  This information can be used to recreate the keys (if 
enough replicas are still available) using existing tools (`ozone sh key 
get/put`).
   
   https://issues.apache.org/jira/browse/HDDS-10751
   
   ## How was this patch tested?
   
   Configured Ozone's `restart` Docker Compose environment with some additional 
datanodes.  Set 8MB block size, so EC 3-2 would use 1 block on each datanode 
hosting the EC container for each 24MB of total key size.
   
   Created a set of EC keys spread out in 4 containers:
   
   - container 1 had a key of 1MB, 25MB, 49MB, 73MB each (thus 2 padding blocks)
   - container 2 had a key of 2MB, 26MB, 50MB, 74MB each (1 padding block)
   - container 3 had a key of 3MB, 27MB, 51MB, 75MB each (no padding)
   - container 4 had a key of 4MB, 28MB, 52MB, 76MB each (no padding)
   
   Reproduced HDDS-10681 by stopping nodes hosting replica 2 of container 1 and 
replica 3 of container 2, and letting Ozone reconstruct all containers.
   
   ```
   $ ozone debug fmp
   Key  Size    Replication
   vol1/bucket2/50mb    52428800        rs-3-2-1024k
   vol1/bucket1/1mb     1048576 rs-3-2-1024k
   vol1/bucket2/26mb    27262976        rs-3-2-1024k
   vol1/bucket2/2mb     2097152 rs-3-2-1024k
   vol1/bucket1/73mb    76546048        rs-3-2-1024k
   vol1/bucket2/74mb    77594624        rs-3-2-1024k
   vol1/bucket1/49mb    51380224        rs-3-2-1024k
   vol1/bucket1/25mb    26214400        rs-3-2-1024k
   ```
   
   By setting `OZONE_LOGLEVEL`, additional details are logged about the process:
   
   ```
   $ OZONE_LOGLEVEL=INFO ozone debug fmp
   ...
   [main] INFO shell.Handler: Found 4 blocks missing from container 1 on 
replica 2 at 
a9c3b585-23bc-4180-85e4-0180cb1e6054(restart_dn7_1.restart_net/10.9.0.17)
   [main] INFO shell.Handler: Found 4 blocks missing from container 2 on 
replica 3 at 
a9c3b585-23bc-4180-85e4-0180cb1e6054(restart_dn7_1.restart_net/10.9.0.17)
   ...
   ```
   
   (To filter unrelated log messages, set level specifically for 
`log4j.logger.org.apache.hadoop.ozone.shell` in `log4j.properties`, and use 
`OZONE_LOGLEVEL=WARN` to enable logging for the CLI command.)
   
   Keys recreated using get/put are no longer reported:
   
   ```
   $ ozone debug fmp
   Key  Size    Replication
   vol1/bucket2/50mb    52428800        rs-3-2-1024k
   vol1/bucket2/2mb     2097152 rs-3-2-1024k
   vol1/bucket2/74mb    77594624        rs-3-2-1024k
   vol1/bucket2/26mb    27262976        rs-3-2-1024k
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to