Ethan Rose created HDDS-10110:
---------------------------------
Summary: Use RocksDB key count estimates instead of OM metrics file
Key: HDDS-10110
URL: https://issues.apache.org/jira/browse/HDDS-10110
Project: Apache Ozone
Issue Type: Sub-task
Components: OM
Reporter: Ethan Rose
Assignee: Ethan Rose
HDDS-816 added a json file in the OM to store persisted metrics like key count.
The Jira has a doc attached that compares some options and decides that
periodically flushing to a json file is the best approach. However, it neglects
many issues with saving metrics this way:
* Error handling was missed. See HDDS-10094
* OMs' metrics can diverge if OMs are restarted at different times between
flushes of the file.
* On snapshot install on a follower, the metric will be [reset to estimated
row|https://github.com/apache/ozone/blob/14e7ff1e6fb2bf11f1df054c63b6e1729e328286/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java#L4006]
count anyways. This follower will now have diverged metrics from the other OMs.
* When metrics for various OMs diverge, they will show different lines in
dashboarding applications like Grafana, which may be confusing for users.
* Restoring the metric to a correct value after bugs like HDDS-10063 requires
some sort of manual repair.
* Once metrics diverge between OMs, even a restart will not bring them back in
sync.
[HDDS-1829|https://issues.apache.org/jira/browse/HDDS-1829] later added the
ability for some metrics to be updated based on RocksDB key count estimates.
See {{Q: How to know the number of keys stored in a RocksDB database?}}
[RocksDB FAQ|https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ]. These
metrics survive restart using the key count estimate and do not use the metrics
json file, so we have two divergent implementations. However, once these
metrics are updated on startup, they are not incremented as new OM operations
come in.
This jira proposes:
# Get rid of the OM metrics json file.
# Use key count estimates for all metrics that must survive a restart.
# Continue to update these metrics as OM requests come in.
While the RocksDB estimated key count will not be totally accurate, the json
based approach will not be either. The RocksDB approach is easier to maintain
both in terms of code required and fixing metric counting bugs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]