[
https://issues.apache.org/jira/browse/HDDS-12207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937254#comment-17937254
]
Ethan Rose commented on HDDS-12207:
-----------------------------------
Thanks for looking at thisĀ [~sarvekshayr]. I took a stab some ideas for output
too and came up with something like this:
{code}
{
"keys": [
{
"volumeName": "vol", // Split volume and bucket for easier
post-processing. This also matches key list output format.
"bucketName": "bucket",
"name": "HISTORY.md",
// Set to false if any of the replica's checks failed, or any block
had no replicas found.
"pass": false,
"blocks": [
{
"containerID": 1,
"blockID": 123,
"replicas": [
{
"datanode": {
"uuid": "123-456",
"hostname": "dn1"
},
"checks": [
{
"type": "checksums",
"pass": false,
"failures": [
{
"present": true, // The block was
found in this container replica.
"message": "Inconsistent read for
chunk=123 len=10 bytesRead=5" // Comes from the checksum exception thrown out
of the block input stream. May also be block not found.
}
]
},
{
"type": "block existence",
"pass": false,
"failures": [
{
"message": "" // It's possible that
the getBlock call failed for a different reason other than the block being
missing. We can write that here.
}
]
},
{
"type": "container states",
"pass": false,
"failures": [
{
// This check works on both SCM and
the replicas, so scm state would end up duplicated among each replica's output
in this layout.
// SCM states of DELETING or
DELETED would trigger a failure. Missing containers would already have an empty
replica list as described above.
"scmState": "CLOSED",
"present": true,
// Use the datanodes' readContainer
API instead of SCM's getContainer API for the most up to date info.
// UNHEALTHY would currently be the
only replica state to count as a failure.
"replicaState": "UNHEALTHY"
}
]
}
]
}
]
}
]
}
],
"pass": true // Populated at the end to quickly see if there were any
failures.
}
{code}
We can make the output print minimal information by default, and use additional
flags to add information. For example, instead of {{--failures-only}} to print
only the failing keys. we can print only failures by default. Passing {{--all}}
would print results for all passing and failing keys. The extra {{failures}}
information would be omitted unless ((--verbose}} is passed.
> Unify output of `ozone debug replicas verify` checks
> ----------------------------------------------------
>
> Key: HDDS-12207
> URL: https://issues.apache.org/jira/browse/HDDS-12207
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ethan Rose
> Assignee: Sarveksha Yeshavantha Raju
> Priority: Major
>
> Make {{ozone debug replicas verify}} output json information about each key
> and the checks that were run on it. This could optionally be streamed to
> stdout or broken up into multiple files as specified by the user. As new
> checks are added, their results will be included in the same json objects. We
> can also add an option to skip output for keys that passed all the specified
> checks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]