On 9/11/17 11:52 PM, Stack wrote:
On Mon, Sep 11, 2017 at 11:07 AM, Vladimir Rodionov <[email protected]>
wrote:

...
That is mostly it. Yes, We have not done real testing with real data on a
real cluster yet, except QA  testing on a small OpenStack
cluster (10 nodes). That is our probably the biggest minus right now. I
would like to inform community that this week we are going to start
full scale testing with reasonably sized data sets.

... Completion of HA seems important as is result of the scale testing.


I think we should knock out a rough sketch on what effective "scale" testing would look like since that is a very subjective phrase. Let me start the ball rolling with a few things that come to my mind.

(interpreting requirements as per rfc2119)

* MUST have >5 RegionServers and >1 Masters in play
* MUST have Non-trivial final data sizes (final data size would be >= 100's of GB)
* MUST have some clear pass/fail determination for correctness of B&R
* MUST have some fault-injection

* SHOULD be a completely automated test, not require coordination of a human to executing commands. * SHOULD be able to acquire operational insight (metrics) while performing operations to determine success of testing * SHOULD NOT require manual intervention, e.g. working around known issues/limitations
* SHOULD reuse the IntegrationTest framework in hbase-it

Since we have a concern of correctness, ITBLL sounds like a good starting point to avoid having to re-write similar kinds of logic. ChaosMonkey is always great for fault-injection.

Thoughts?

Reply via email to