[
https://issues.apache.org/jira/browse/HDDS-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prashant Pogde updated HDDS-330:
--------------------------------
Target Version/s: 1.2.0
I am managing the 1.1.0 release and we currently have more than 600 issues
targeted for 1.1.0. I am moving the target field to 1.2.0.
If you are actively working on this jira and believe this should be targeted to
1.1.0 release, Please change the target field back to 1.1.0 before Feb 05,
2021.
> Ozone: number of keys/values/buckets to KSMMetrics
> --------------------------------------------------
>
> Key: HDDS-330
> URL: https://issues.apache.org/jira/browse/HDDS-330
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Components: Ozone Manager
> Reporter: Marton Elek
> Priority: Major
>
> During my last ozone test with 100 node ozone cluster I see a problem to
> track how many keys/volumes/buckets do I have.
> I opened this jira to start a discussion about extending KSM metrics (but let
> me know if this is already planned somewhere else) and add number of
> keys/volumes/buckets to the metrics interface.
> These counters could be added to anywhere else (for example as a client call)
> but I think it is an important number and would be worth to monitor it.
> I see multiple ways to achieve it:
> 1. Extend the `org.apache.hadoop.utils.MetadaStore` class with an additional
> count() method. As I know there is no easy way to implement it with leveldb
> but with rocksdb there is a posibility to get the _estimated_ number of keys.
> On the other hand KSM stores volumes/buckets/keys in the same db, so we can't
> use it without splitting the ksm.db to separated dbs.
> 2. Create a background task to iterate over all the keys and count ozone
> key/volume/bucket numbers:
> pro: it would be independent from the existing program flow
> con: doesn't provided up-to-date information.
> con: it uses more resources to scan the whole db frequently
> 3. During the startup we can iterate over the whole ksm.db and count the
> current metrics, and later we can update the numbers in case of new
> create/delete calls. It uses additional resources during the startup (should
> be checked how much time is to parse a db with millions of keys) but after
> that it would be fast. Also we can introduce new confguration variables to
> skip the initial scan. In that case the numbers will be valid only from the
> last restart but the startup would be fast.
> I suggest to use the 3rd approach, could you please comment about your
> opinion?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]