[DISCUSS] Plan for Distributed testing of Backup and Restore

Josh Elser Tue, 12 Sep 2017 09:07:26 -0700

On 9/11/17 11:52 PM, Stack wrote:

On Mon, Sep 11, 2017 at 11:07 AM, Vladimir Rodionov <[email protected]>
wrote:

...
That is mostly it. Yes, We have not done real testing with real data on a
real cluster yet, except QA  testing on a small OpenStack
cluster (10 nodes). That is our probably the biggest minus right now. I
would like to inform community that this week we are going to start
full scale testing with reasonably sized data sets.

... Completion of HA seems important as is result of the scale testing.

I think we should knock out a rough sketch on what effective "scale"testing would look like since that is a very subjective phrase. Let mestart the ball rolling with a few things that come to my mind.


(interpreting requirements as per rfc2119)

* MUST have >5 RegionServers and >1 Masters in play

* MUST have Non-trivial final data sizes (final data size would be >=100's of GB)

* MUST have some clear pass/fail determination for correctness of B&R
* MUST have some fault-injection

* SHOULD be a completely automated test, not require coordination of ahuman to executing commands.* SHOULD be able to acquire operational insight (metrics) whileperforming operations to determine success of testing* SHOULD NOT require manual intervention, e.g. working around knownissues/limitations

* SHOULD reuse the IntegrationTest framework in hbase-it

Since we have a concern of correctness, ITBLL sounds like a goodstarting point to avoid having to re-write similar kinds of logic.ChaosMonkey is always great for fault-injection.


Thoughts?

[DISCUSS] Plan for Distributed testing of Backup and Restore

Reply via email to