Hadoop can be run on a hardware heterogeneous cluster. Currently,
Hadoop clusters really only run well on Linux although you can run a
Hadoop client on non-Linux machines.
You will need to have a special configuration for each of the machine
in your cluster based on their hardware profile. Ideally, you'll be
able to group the machines in your cluster into "classes" of machines
(e.g. machines with 1GB of RAM and 2 core versus 4GB of RAM and 4
core) to reduce the burden of managing multiple configurations. If
you are talking about a Hadoop cluster that is completely
heterogeneous (each machine is completely different), the management
overhead could be high.
Configuration variables like "mapred.tasktracker.map.tasks.maximum"
and "mapred.tasktracker.reduce.tasks.maximum" should be set based on
the number of cores/memory in each machine. Variables like
"mapred.child.java.opts" need to be set differently based on the
amount of memory the machine has (e.g. "-Xmx250m"). You should have
at least 250MB of memory dedicated to each task although more is
better. It's also wise to make sure that each task has the same
amount of memory regardless of the machine it's scheduled on;
otherwise, tasks might succeed or fail based on which machine gets the
task. This asymmetry will make debugging harder.
You can use our online configurator (http://www.cloudera.com/configurator/
), to generate optimized configurations for each class of machines in
your cluster. It will ask simple question about your configuration
and then produce a hadoop-site.xml file.
Good luck!
-Matt
On Jun 18, 2009, at 8:33 AM, ashish pareek wrote:
Can you tell few of the challenges in configuring heterogeneous
cluster...or
can pass on some link where I would get some information regarding
challenges in running Hadoop on heterogeneous hardware
One more things is How about running different applications on the
same
Hadoop cluster?and what challenges are involved in it ?
Thanks,
Regards,
Ashish
On Thu, Jun 18, 2009 at 8:53 PM, jason hadoop
<jason.had...@gmail.com>wrote:
I don't know anyone who has a completely homogeneous cluster.
So hadoop is scalable across heterogeneous environments.
I stated that configuration is simpler if the machines are similar
(There
are optimizations in configuration for near homogeneous machines.)
On Thu, Jun 18, 2009 at 8:10 AM, ashish pareek <pareek...@gmail.com>
wrote:
Does that mean hadoop is not scalable wrt heterogeneous
environment? and
one
more question is can we run different application on the same hadoop
cluster
.
Thanks.
Regards,
Ashish
On Thu, Jun 18, 2009 at 8:30 PM, jason hadoop
<jason.had...@gmail.com
wrote:
Hadoop has always been reasonably agnostic wrt hardware and
homogeneity.
There are optimizations in configuration for near homogeneous
machines.
On Thu, Jun 18, 2009 at 7:46 AM, ashish pareek
<pareek...@gmail.com>
wrote:
Hello,
I am doing my master.... my final year project is on Hadoop
...so
I
would like to know some thing about Hadoop cluster i.e, Do new
version
of
Hadoop are able to handle heterogeneous hardware.If you have any
informantion regarding these please mail me as my project is in
heterogenous
environment.
Thanks!
Reagrds,
Ashish Pareek
--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals
--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals