xichen01 commented on PR #6774:
URL: https://github.com/apache/ozone/pull/6774#issuecomment-2154052199
@errose28 Thanks for your detailed response.
### What is the overall goal?
The goal is to eventually implement a tool(command line tool) that can
quickly find missing keys.
Here the "missing keys" means the key exists in OM, but the Block is missing.
### What is the context
There are two scenarios that require this tool
- We're performing an Orphan Block cleanup, which is done with an external
tool, and this “find missing key” can be used to make sure that all keys in the
cluster are not lost as a result of the Orphan Block cleanup.
- We find that there are missing keys in the cluster, and when we read them,
we get an error “NO_REPLICA_FOUND” or “Unable to find the block”. Our cluster
has been running for several years and has gone through many releases, and due
to some historical bugs, some keys are missing and we need to be able to find
them.
We can't guarantee that there won't be other reasons for data loss in the
future, so we need a tool that can quickly scan the entire cluster and make
sure that all keys aren't missing.
### There looks to be overlap with other features
“find missing keys” and "container reconciliation" and "Datanode scanner"
have different purposes.
- find missing keys is to find possible missing keys.
- This part may overlap with HDDS-9346, but its development process is
uncertain.
- Container reconciliation is more about solving the Contianer data
consistency problem.
- Datanode scanner is to ensure the reliability of Datanode data, it can
find checksum errors or disk data loss. But it can't find blocks that are
supposed to be in the Container, but aren't actually in the Container.
### The scanner also accounts for nuances like container state, block
deletion that modifies disk and DB in multiple steps
I think “find missing keys” doesn't need to take into account such things as
Container state, because find missing key is to find Block replica of keys that
are completely missing, and as long as any of these keys “exist” then the key
is not missing in the cluster, and the key can be recovered by some way.
### Does “find missing key” require reading and checking of data?
“find missing key” is only used to verify existence, not correctness, which
can be guaranteed by the Datanode scanner and checksum.
### The "head request" terminology is kind of confusing in this context
Makes sense , I think we can change the name of the API. maybe we can rename
it to `verifyBlocksExistence`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]