[Hadoop Wiki] Update of "TestingNov2009" by SteveLoughr an

Apache Wiki Sun, 22 Nov 2009 04:24:51 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "TestingNov2009" page has been changed by SteveLoughran.
The comment on this change is: Added stuff on OS QAing, some ideas at dealing 
with the problems.
http://wiki.apache.org/hadoop/TestingNov2009?action=diff&rev1=4&rev2=5

--------------------------------------------------

  
  There are currently no tests that work with Hadoop via the web pages, no job 
submission and monitoring. It is in fact possible to bring up a Hadoop cluster 
in which JSP doesn't work, but the basic tests all appear well -even including 
TeraSort, provided you use the low-level APIs.
  
- Options
+ 
+ ''Proposals:''
   * Create a set of JUnit/HtmlUnit tests that test the GUI; design these to 
run against any host. Either check out the source tree and run the against a 
remote cluster, or package the tests in a JAR and make this a project 
distributable. 
  * We may need separate test JARs for HDFS and mapreduce.
  
@@ -63, +64 @@

   * For testing local Hadoop builds on IaaS platforms, the build process needs 
to scp over and install the Hadoop binaries and the configuration files. This 
can be done by creating a new disk image that is then used to bootstrap every 
node, or you start with a base clean image and copy in Hadoop on demand. The 
latter is much more agile and cost effective during iterative development, but 
doesn't scale to very-large clusters (1000s of machines), unless you delegate 
the task of copy/install to the first few tens of allocated machines. For EC2, 
one tactic is to upload the binaries to S3, and have scripts on the nodes to 
copy down and install the files.
  
  
+ See: 
[[http://www.netkernel.org/blogxter/entry?publicid=12CE2B62F71239349F3E9903EAE9D1F0
 | A Cloud Tools Manifesto]]
+ 
+ == Qualifying Hadoop on different platforms ==
+ 
+ Currently Hadoop is only used at scale on RHEL + Sun JVM, because that is 
what Yahoo! run their clusters on, and nobody else is running different 
platforms in their production clusters -or if they are, they aren't discussing 
it in public.
+ 
+  * It would be interesting to start collecting experiences with running 
Hadoop on other platforms -different Unix flavours in particular, even if this 
is not a formal pre-release process.
+  * Windows and OS/X support Hadoop, reluctantly, with Windows being the most 
reluctant. Nobody admits to using Windows in production, and it may not get 
tested at any serious scale before a release is made. 
+ 
+ What would it take to test Hadoop releases on different operating systems? 
We'd need clusters of real or virtual machines and then run any cluster 
qualification tests on them; publish the results. This would not be a 
performance game; throughput isn't important, it's more "does this work on a 
specific OS at 100+ machines"?
+ 
+ 
  == Exploring the Hadoop Configuration Space ==
  
  There are a lot of Hadoop configuration options, even ignoring those of the 
underlying machines and network. For example, what impact does blocksize and 
replication factor have on your workload? What different network card 
configuration parameters give the best performance? Which combinations of 
options break things?
@@ -73, +86 @@

  
   * There is existing work on automated configuration testing, notably the 
work done by Adam Porter and colleagues on 
[[http://www.youtube.com/watch?v=r0nn40O3mCY | Distributed Continuous Quality 
Assurance]]
   * (Steve says) in HP we've used a Pseudo-RNG to drive transforms to the 
infrastructure and deployed applications, this explores some of the space and 
is somewhat replicable.
+ 
+ ''Proposal:'' Make this a research topic, pull in the experts in testing, and 
give encouragement to work on this problem. Offering cluster time may help.
  
  == Testing applications that run on Hadoop ==
  
@@ -97, +112 @@

   * Network failures can be simulated on some IaaS platforms just by breaking 
a virtual link
   * Forcibly killing processes is a more realistic approach which works on 
most platforms, though it is hard to choreograph
  
+ 
+ ''Proposal:'' Any Infrastructure API ought should offer the opportunity to 
simulate failures, either by turning off nodes without warning, or (better) 
breaking the network connections between live nodes.
+

[Hadoop Wiki] Update of "TestingNov2009" by SteveLoughr an

Reply via email to