On 22/10/10 01:10, Konstantin Boudnik wrote:
On Thu, Oct 21, 2010 at 05:53PM, Ian Holsman wrote:
In discussing it with people, I've heard that a major issue (not the only
one i'm sure) is lack of resources to actually test the apache releases on
large clusters, and that it is very hard getting this done in short cycles
(hence the large gap between 20.x and 21).

I do agree the lack of resources for testing Hadoop is a problem. However,
there might be some slight difference in the meaning of word 'resources' ;)

The only way, IMO, to have a reasonable testing done on a system as complex as
Hadoop is to invest into automatic validation of builds at system level. This
requires a few things (resources, if you will):
   - extra hardware (the easiest and cheapest problem)
   - automatic deployment, testing, and analysis
   - system tests development which able to control and observe a cluster
     behavior (in other words something more sophisticated than just shell
     scripts)

And for the semi-adequate system testing you don't need a large cluster: 10-20
nodes will be sufficient in most cases. But the automation of all the
processes starting from deployment is the key. Testing automation is in a
little better shape for Hadoop has that system test framework called Herriot
(part of Hadoop code base for about 7 months now), but it still needs further
extending.


+1 for testing, I would like to help with this, but my test stuff depends on my lifecycle stuff which I need to sit down, sync up with trunk and work out how to get in.

One thing you can do in a virtual world which you can't do in the physical space is reconfigure the LAN on the fly, to see what happens. For example, I could set up VLANs of two racks and a switch between them, then partition the two and see what happens -while a simulated external load (separate issue) hits the NN with the same amount of traffic. Fun things.

Reply via email to