RE: Research projects for hadoop

Saikat Kanjilal Fri, 09 Sep 2011 10:34:51 -0700

How about using virtual box and centos 64 bit to serve as a linux container for 
isolating map/reduce processes?  I have setup this up in the past, its really 
easy.



> From: ev...@yahoo-inc.com
> To: mapreduce-dev@hadoop.apache.org
> Date: Fri, 9 Sep 2011 10:30:37 -0700
> Subject: Re: Research projects for hadoop
> 
> The biggest issue with Xen and other virtualization technologies is that 
> often there is an IO penalty involved with using them.  For many jobs this is 
> not an acceptable trade off.  I do know, however, that there has been some 
> discussion about using Linux Containers for isolation of Map/Reduce 
> processes.  I don't know if any JIRA has been filed for it or not, but they 
> are much lighter weight then Xen and other virtualization tech, because all 
> it really is concerned with is resource isolation, and not virtualizing an 
> entire operating system.
> 
> --Bobby Evans
> 
> On 9/9/11 10:58 AM, "Saikat Kanjilal" <sxk1...@hotmail.com> wrote:
> 
> 
> 
> Hi  Folks,I was looking through the following wiki page:  
> http://wiki.apache.org/hadoop/HadoopResearchProjects and was wondering if 
> there's been any work done (or any interest to do work) for the following 
> topics:
> Integration of Virtualization (such as Xen) with Hadoop toolsHow does one 
> integrate sandboxing of arbitrary user code in C++ and other languages in a 
> VM such as Xen with the Hadoop framework? How does this interact with SGE, 
> Torque, Condor?As each individual machine has more and more cores/cpus, it 
> makes sense to partition each machine into multiple virtual machines. That 
> gives us a number of benefits:By assigning a virtual machine to a datanode, 
> we effectively isolate the datanode from the load on the machine caused by 
> other processes, making the datanode more responsive/reliable.With multiple 
> virtual machines on each machine, we can lower the granularity of hod 
> scheduling units, making it possible to schedule multiple tasktrackers on the 
> same machine, improving the overall utilization of the whole clusters.With 
> virtualization, we can easily snapshot a virtual cluster before releasing it, 
> making it possible to re-activate the same cluster in the future and start to 
> work from the snapshot.Provisioning of long running Services via HODWork on a 
> computation model for services on the grid. The model would include:Various 
> tools for defining clients and servers of the service, and at the least a C++ 
> and Java instantiation of the abstractionsLogical definitions of how to 
> partition work onto a set of servers, i.e. a generalized shard 
> implementationA few useful abstractions like locks (exclusive and RW, 
> fairness), leader election, transactions,Various communication models for 
> groups of servers belonging to a service, such as broadcast, unicast, 
> etc.Tools for assuring QoS, reliability, managing pools of servers for a 
> service with spares, etc.Integration with HDFS for persistence, as well as 
> access to local filesystemsIntegration with ZooKeeper so that applications 
> can use the namespace
> I would like to either help out with a design for the above or prototyping 
> code, please let me know if and what the process may be to move forward with 
> this.
> Regards
>

RE: Research projects for hadoop

Reply via email to