Hello Everyone,

I am a Software Engineer at Microsoft and our team is currently working on
making the deployment and operations of Accumulo on Azure as seamless as
possible. As part of this effort, we are attempting to observe / measure
some standard Accumulo operations (e.g. scan, canary queries, ingest, etc.)
and how their performance varies over time on long standing Accumulo
clusters running in Azure. As part of this we’re looking to come up with a
metric that we can use to evaluate how healthy / available an Accumulo
cluster is. Over time we intend to use this to understand how underlying
platform changes in Azure can affect overall health of Accumulo workloads.



As a starting metric for example, we are thinking of continually doing
scans of random values across various tablet servers and capturing timing
information related to how long such scans take. I took a quick look at the
accumulo-testing repo and didn’t find any tests or probes attempting to do
something along these lines. Does something like this seem reasonable? Has
anyone previously attempted something similar? Does accumulo-testing seem
like a reasonable place for code that attempts to do something like this?



Appreciate your thoughts and feedback.



Cheers,

Tushar Dhadiwal

Reply via email to