How about using virtual box and centos 64 bit to serve as a linux container for isolating map/reduce processes? I have setup this up in the past, its really easy.
> From: ev...@yahoo-inc.com > To: mapreduce-dev@hadoop.apache.org > Date: Fri, 9 Sep 2011 10:30:37 -0700 > Subject: Re: Research projects for hadoop > > The biggest issue with Xen and other virtualization technologies is that > often there is an IO penalty involved with using them. For many jobs this is > not an acceptable trade off. I do know, however, that there has been some > discussion about using Linux Containers for isolation of Map/Reduce > processes. I don't know if any JIRA has been filed for it or not, but they > are much lighter weight then Xen and other virtualization tech, because all > it really is concerned with is resource isolation, and not virtualizing an > entire operating system. > > --Bobby Evans > > On 9/9/11 10:58 AM, "Saikat Kanjilal" <sxk1...@hotmail.com> wrote: > > > > Hi Folks,I was looking through the following wiki page: > http://wiki.apache.org/hadoop/HadoopResearchProjects and was wondering if > there's been any work done (or any interest to do work) for the following > topics: > Integration of Virtualization (such as Xen) with Hadoop toolsHow does one > integrate sandboxing of arbitrary user code in C++ and other languages in a > VM such as Xen with the Hadoop framework? How does this interact with SGE, > Torque, Condor?As each individual machine has more and more cores/cpus, it > makes sense to partition each machine into multiple virtual machines. That > gives us a number of benefits:By assigning a virtual machine to a datanode, > we effectively isolate the datanode from the load on the machine caused by > other processes, making the datanode more responsive/reliable.With multiple > virtual machines on each machine, we can lower the granularity of hod > scheduling units, making it possible to schedule multiple tasktrackers on the > same machine, improving the overall utilization of the whole clusters.With > virtualization, we can easily snapshot a virtual cluster before releasing it, > making it possible to re-activate the same cluster in the future and start to > work from the snapshot.Provisioning of long running Services via HODWork on a > computation model for services on the grid. The model would include:Various > tools for defining clients and servers of the service, and at the least a C++ > and Java instantiation of the abstractionsLogical definitions of how to > partition work onto a set of servers, i.e. a generalized shard > implementationA few useful abstractions like locks (exclusive and RW, > fairness), leader election, transactions,Various communication models for > groups of servers belonging to a service, such as broadcast, unicast, > etc.Tools for assuring QoS, reliability, managing pools of servers for a > service with spares, etc.Integration with HDFS for persistence, as well as > access to local filesystemsIntegration with ZooKeeper so that applications > can use the namespace > I would like to either help out with a design for the above or prototyping > code, please let me know if and what the process may be to move forward with > this. > Regards >