errose28 commented on PR #7548:
URL: https://github.com/apache/ozone/pull/7548#issuecomment-2635412705
I like the idea of being able to quickly check block existence without
downloading all the data. We are working on a similar set of tools that might
be useful in HDDS-12206. The final implementation will look something like this:
- Command `ozone debug replicas verify <URI>... [--checksums] [--metadata]
[--padding]` where:
- `ozone debug replicas verify` does one iteration through the namespace
provided in `<URI>`. It may use multiple threads to work through all the keys.
- For each key under `<URI>`, the client uses the block locations from OM
without consulting SCM
- Client uses the block info to checks on the replicas as specified by the
flags:
- `--checksums`: Download data from the datanode to do client side
checksum verification. Replaces the `read-replicas` commad.
- `--padding`: Check for EC missing padding. Replaces the
`find-missing-padding` command and is a no-op for non-EC keys.
- `--metadata` (or other name): Only check block existence by `GetBlock`
calls to the datanodes without downloading data.
We could add this another flag like `--container-state` or similar to check
the SCM container state for each container referenced in the keys and raise an
error if it is deleted or has no replicas. This check would not involve the
datanodes. I think this would encapsulate the requirements in this doc,
although the underlying implementation is different. Using SCM as described in
this doc may have some issues:
- The client will need block tokens to get datanode block metadata, and
these are currently only generated by the OM, and then by datanodes for
verifications.
- We would need to get either block or container tokens to the client
from SCM if we want to use SCM's container data to query the datanodes.
- Building a reverse container -> key mapping is an expensive task. On a
large namespace this will not fit in memory and need to spill to disk
somewhere. It will likely make the command more expensive than just calling
`GetBlock` on the datanodes for each block in the keys.
Let me know what you think of the approach in HDDS-12206 and if it seems
like it could meet the requirements in this doc.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]