Both Hadoop and virtualization are means to an end. That end is to consolidate workloads traditionally deployed to separate servers so the average utilization and ROI of a given server increases.
Companies looking to consolidate data-intensive computation may be better served moving to Hadoop infrastructure than a virtualization project. Let me give you an example: > From: Saikat Kanjilal [mailto:sxk1...@hotmail.com] > By assigning a virtual machine to a datanode, we effectively isolate > the datanode from the load on the machine caused by other processes, making > the > datanode more responsive/reliable.W One can set up virtual partitions of CPU and RAM resources that can be fairly independent, but attempting to stack I/O intensive workloads on top of each other via virtualization is a recipe for lower performance, negative ROI, and dissatisfied users. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ----- Original Message ----- > From: "Segel, Mike" <mse...@navteq.com> > To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>; > "mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org> > Cc: > Sent: Friday, September 9, 2011 10:45 AM > Subject: RE: Research projects for hadoop > > Why would you want to take a perfectly good machine and then try to > virtualize > it? > I mean if I have 4 quad core cpus, I can run a lot of simultaneous map tasks. > However if I virtualize the box, I lose at least 1 core per VM so I end up > with > 4 nodes that have less capabilities and performance than I would have under > my > original box.... > > > -----Original Message----- > From: Saikat Kanjilal [mailto:sxk1...@hotmail.com] > Sent: Friday, September 09, 2011 10:59 AM > To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org > Subject: Research projects for hadoop > > > Hi Folks,I was looking through the following wiki page: > http://wiki.apache.org/hadoop/HadoopResearchProjects and was wondering if > there's been any work done (or any interest to do work) for the following > topics: > Integration of Virtualization (such as Xen) with Hadoop toolsHow does one > integrate sandboxing of arbitrary user code in C++ and other languages in a > VM > such as Xen with the Hadoop framework? How does this interact with SGE, > Torque, > Condor?As each individual machine has more and more cores/cpus, it makes > sense > to partition each machine into multiple virtual machines. That gives us a > number > of benefits:By assigning a virtual machine to a datanode, we effectively > isolate > the datanode from the load on the machine caused by other processes, making > the > datanode more responsive/reliable.With multiple virtual machines on each > machine, we can lower the granularity of hod scheduling units, making it > possible to schedule multiple tasktrackers on the same machine, improving the > overall utilization of the whole clusters.With virtualization, we can easily > snapshot a virtual cluster before releasing it, making it possible to > re-activate the same cluster in the future and start to work from the > snapshot.Provisioning of long running Services via HODWork on a computation > model for services on the grid. The model would include:Various tools for > defining clients and servers of the service, and at the least a C++ and Java > instantiation of the abstractionsLogical definitions of how to partition work > onto a set of servers, i.e. a generalized shard implementationA few useful > abstractions like locks (exclusive and RW, fairness), leader election, > transactions,Various communication models for groups of servers belonging to > a > service, such as broadcast, unicast, etc.Tools for assuring QoS, reliability, > managing pools of servers for a service with spares, etc.Integration with > HDFS > for persistence, as well as access to local filesystemsIntegration with > ZooKeeper so that applications can use the namespace I would like to either > help > out with a design for the above or prototyping code, please let me know if > and > what the process may be to move forward with this. > Regards > > The information contained in this communication may be CONFIDENTIAL and is > intended only for the use of the recipient(s) named above. If you are not > the > intended recipient, you are hereby notified that any dissemination, > distribution, or copying of this communication, or any of its contents, is > strictly prohibited. If you have received this communication in error, > please > notify the sender and delete/destroy the original message and any copy of it > from your computer or paper files. >