Re: Research projects for hadoop

Andrew Purtell Fri, 09 Sep 2011 11:25:01 -0700

Both Hadoop and virtualization are means to an end. That end is to consolidate 
workloads traditionally deployed to separate servers so the average utilization 
and ROI of a given server increases.


Companies looking to consolidate data-intensive computation may be better 
served moving to Hadoop infrastructure than a virtualization project. Let me 
give you an example:

> From: Saikat Kanjilal [mailto:sxk1...@hotmail.com]
> By assigning a virtual machine to a datanode, we effectively isolate 
> the datanode from the load on the machine caused by other processes, making 
> the 
> datanode more responsive/reliable.W


One can set up virtual partitions of CPU and RAM resources that can be fairly 
independent, but attempting to stack I/O intensive workloads on top of each 
other via virtualization is a recipe for lower performance, negative ROI, 
and dissatisfied users.

Best regards,


   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)


----- Original Message -----
> From: "Segel, Mike" <mse...@navteq.com>
> To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>; 
> "mapreduce-...@hadoop.apache.org" <mapreduce-...@hadoop.apache.org>
> Cc: 
> Sent: Friday, September 9, 2011 10:45 AM
> Subject: RE: Research projects for hadoop
> 
> Why would you want to take a perfectly good machine and then try to 
> virtualize 
> it?
> I mean if I have 4 quad core cpus, I can run a lot of simultaneous map tasks. 
> However if I virtualize the box, I lose at least 1 core per VM so I end up 
> with 
> 4 nodes that have less capabilities and performance than I would have under 
> my 
> original box....
> 
> 
> -----Original Message-----
> From: Saikat Kanjilal [mailto:sxk1...@hotmail.com]
> Sent: Friday, September 09, 2011 10:59 AM
> To: common-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org
> Subject: Research projects for hadoop
> 
> 
> Hi  Folks,I was looking through the following wiki page:  
> http://wiki.apache.org/hadoop/HadoopResearchProjects and was wondering if 
> there's been any work done (or any interest to do work) for the following 
> topics:
> Integration of Virtualization (such as Xen) with Hadoop toolsHow does one 
> integrate sandboxing of arbitrary user code in C++ and other languages in a 
> VM 
> such as Xen with the Hadoop framework? How does this interact with SGE, 
> Torque, 
> Condor?As each individual machine has more and more cores/cpus, it makes 
> sense 
> to partition each machine into multiple virtual machines. That gives us a 
> number 
> of benefits:By assigning a virtual machine to a datanode, we effectively 
> isolate 
> the datanode from the load on the machine caused by other processes, making 
> the 
> datanode more responsive/reliable.With multiple virtual machines on each 
> machine, we can lower the granularity of hod scheduling units, making it 
> possible to schedule multiple tasktrackers on the same machine, improving the 
> overall utilization of the whole clusters.With virtualization, we can easily 
> snapshot a virtual cluster before releasing it, making it possible to 
> re-activate the same cluster in the future and start to work from the 
> snapshot.Provisioning of long running Services via HODWork on a computation 
> model for services on the grid. The model would include:Various tools for 
> defining clients and servers of the service, and at the least a C++ and Java 
> instantiation of the abstractionsLogical definitions of how to partition work 
> onto a set of servers, i.e. a generalized shard implementationA few useful 
> abstractions like locks (exclusive and RW, fairness), leader election, 
> transactions,Various communication models for groups of servers belonging to 
> a 
> service, such as broadcast, unicast, etc.Tools for assuring QoS, reliability, 
> managing pools of servers for a service with spares, etc.Integration with 
> HDFS 
> for persistence, as well as access to local filesystemsIntegration with 
> ZooKeeper so that applications can use the namespace I would like to either 
> help 
> out with a design for the above or prototyping code, please let me know if 
> and 
> what the process may be to move forward with this.
> Regards
> 
> The information contained in this communication may be CONFIDENTIAL and is 
> intended only for the use of the recipient(s) named above.  If you are not 
> the 
> intended recipient, you are hereby notified that any dissemination, 
> distribution, or copying of this communication, or any of its contents, is 
> strictly prohibited.  If you have received this communication in error, 
> please 
> notify the sender and delete/destroy the original message and any copy of it 
> from your computer or paper files.
>

Re: Research projects for hadoop

Reply via email to