Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "TestingNov2009" page has been changed by SteveLoughran. The comment on this change is: Added stuff on OS QAing, some ideas at dealing with the problems. http://wiki.apache.org/hadoop/TestingNov2009?action=diff&rev1=4&rev2=5 -------------------------------------------------- There are currently no tests that work with Hadoop via the web pages, no job submission and monitoring. It is in fact possible to bring up a Hadoop cluster in which JSP doesn't work, but the basic tests all appear well -even including TeraSort, provided you use the low-level APIs. - Options + + ''Proposals:'' * Create a set of JUnit/HtmlUnit tests that test the GUI; design these to run against any host. Either check out the source tree and run the against a remote cluster, or package the tests in a JAR and make this a project distributable. * We may need separate test JARs for HDFS and mapreduce. @@ -63, +64 @@ * For testing local Hadoop builds on IaaS platforms, the build process needs to scp over and install the Hadoop binaries and the configuration files. This can be done by creating a new disk image that is then used to bootstrap every node, or you start with a base clean image and copy in Hadoop on demand. The latter is much more agile and cost effective during iterative development, but doesn't scale to very-large clusters (1000s of machines), unless you delegate the task of copy/install to the first few tens of allocated machines. For EC2, one tactic is to upload the binaries to S3, and have scripts on the nodes to copy down and install the files. + See: [[http://www.netkernel.org/blogxter/entry?publicid=12CE2B62F71239349F3E9903EAE9D1F0 | A Cloud Tools Manifesto]] + + == Qualifying Hadoop on different platforms == + + Currently Hadoop is only used at scale on RHEL + Sun JVM, because that is what Yahoo! run their clusters on, and nobody else is running different platforms in their production clusters -or if they are, they aren't discussing it in public. + + * It would be interesting to start collecting experiences with running Hadoop on other platforms -different Unix flavours in particular, even if this is not a formal pre-release process. + * Windows and OS/X support Hadoop, reluctantly, with Windows being the most reluctant. Nobody admits to using Windows in production, and it may not get tested at any serious scale before a release is made. + + What would it take to test Hadoop releases on different operating systems? We'd need clusters of real or virtual machines and then run any cluster qualification tests on them; publish the results. This would not be a performance game; throughput isn't important, it's more "does this work on a specific OS at 100+ machines"? + + == Exploring the Hadoop Configuration Space == There are a lot of Hadoop configuration options, even ignoring those of the underlying machines and network. For example, what impact does blocksize and replication factor have on your workload? What different network card configuration parameters give the best performance? Which combinations of options break things? @@ -73, +86 @@ * There is existing work on automated configuration testing, notably the work done by Adam Porter and colleagues on [[http://www.youtube.com/watch?v=r0nn40O3mCY | Distributed Continuous Quality Assurance]] * (Steve says) in HP we've used a Pseudo-RNG to drive transforms to the infrastructure and deployed applications, this explores some of the space and is somewhat replicable. + + ''Proposal:'' Make this a research topic, pull in the experts in testing, and give encouragement to work on this problem. Offering cluster time may help. == Testing applications that run on Hadoop == @@ -97, +112 @@ * Network failures can be simulated on some IaaS platforms just by breaking a virtual link * Forcibly killing processes is a more realistic approach which works on most platforms, though it is hard to choreograph + + ''Proposal:'' Any Infrastructure API ought should offer the opportunity to simulate failures, either by turning off nodes without warning, or (better) breaking the network connections between live nodes. +
