Hello Everyone,
I am a Software Engineer at Microsoft and our team is currently working on making the deployment and operations of Accumulo on Azure as seamless as possible. As part of this effort, we are attempting to observe / measure some standard Accumulo operations (e.g. scan, canary queries, ingest, etc.) and how their performance varies over time on long standing Accumulo clusters running in Azure. As part of this we’re looking to come up with a metric that we can use to evaluate how healthy / available an Accumulo cluster is. Over time we intend to use this to understand how underlying platform changes in Azure can affect overall health of Accumulo workloads. As a starting metric for example, we are thinking of continually doing scans of random values across various tablet servers and capturing timing information related to how long such scans take. I took a quick look at the accumulo-testing repo and didn’t find any tests or probes attempting to do something along these lines. Does something like this seem reasonable? Has anyone previously attempted something similar? Does accumulo-testing seem like a reasonable place for code that attempts to do something like this? Appreciate your thoughts and feedback. Cheers, Tushar Dhadiwal
