xichen01 commented on PR #7548: URL: https://github.com/apache/ozone/pull/7548#issuecomment-2643378355
> Command ozone debug replicas verify <URI>... [--checksums] [--metadata] [--padding] where > ..... > - --metadata (or other name): Only check block existence by GetBlock calls to the datanodes without downloading data. > > We could add this another flag like --container-state or similar to check the SCM container state for each container reference... > ..... This can be a similar method to check if the key is readable without downloading the key. - The container status check cannot be bypassed. You need to execute getContainer before executing GetBlock. Only normal containers can perform GetBlock. - `GetBlock` does not confirm the existence of the Block file itself, it only gets the Block metadata from the DB, so `GetBlock` cannot check the existence of the block well. - The `headBlocks` mentioned in the document will check the DB and Block files, and its return value will clearly indicate whether it is missing metadata or Block files, or other exceptions. We can also let it check whether the size of the Block file is the same as the record in the OM (this has a high probability of discovering abnormal overwrites, such as SCM assigning duplicate BlockIDs https://github.com/apache/ozone/pull/5018) > Building a reverse container -> key mapping is an expensive task. On a large namespace this will not fit in memory and need to spill to disk somewhere. It will likely make the command more expensive than just calling GetBlock on the datanodes for each block in the keys. - We only need to save the mapping of container -> key of the key being checked in memory. We don't need to save the mapping of all keys. We can control the number of keys being checked through parameters. This part of memory consumption is controllable. - The batch processing check mentioned in this document `headBlock` can achieve very efficient checking. The internal version we are currently using (using `listKeys` to obtain metadata from OM) can reach a speed of 70k/s (online users have no obvious perception), and the bottleneck here is OM's `listKeys`. If `listKeysLight` is used, it can reach a higher speed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
