----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4717/#review6925 -----------------------------------------------------------
I'd prefer we avoid this approach where we need to discover the VM IP and SSH into it. Can we instead pass boot options and have a startup script on the VM launch something that reads a protobuf contact address out of the boot options and retrieves a (protobuf-format) description of what to launch? [This should also avoid fragility from generating a shell script.] On a lower priority, there are some hard-coded paths and some use of mesos_home that won't work; we need to handle (a) out-of-source builds and (b) installation to "standard" paths ($(prefix)/bin $(prefix)/libexec/mesos, etc.). bin/find_addr.pl <https://reviews.apache.org/r/4717/#comment15429> Remove commented out code (and elsewhere). bin/find_addr.pl <https://reviews.apache.org/r/4717/#comment15430> Suggest my %mac_addrs; $mac_addrs{lc $node_value} = 1; above and if ($mac_addrs{$this_addr}) { ... } here. bin/killtree.sh <https://reviews.apache.org/r/4717/#comment15431> How does this relate to our existing killtree script? src/config/config.hpp <https://reviews.apache.org/r/4717/#comment15432> Why is this file being added? src/launcher/launcher.cpp <https://reviews.apache.org/r/4717/#comment15434> Remove. src/launcher/launcher.cpp <https://reviews.apache.org/r/4717/#comment15435> How is this method supposed to differ from run()? Can we share most of the code? src/launcher/launcher.cpp <https://reviews.apache.org/r/4717/#comment15436> Please, not in this patch. src/launcher/launcher.cpp <https://reviews.apache.org/r/4717/#comment15447> Why not hadoopHome? src/launcher/launcher.cpp <https://reviews.apache.org/r/4717/#comment15437> Do we need quoting for this? src/launcher/launcher.cpp <https://reviews.apache.org/r/4717/#comment15456> The values may be require quoting. src/launcher/launcher.cpp <https://reviews.apache.org/r/4717/#comment15438> Make this path a config option instead. If we can provide a sensible default version, it should be installed to libexec. src/launcher/launcher.cpp <https://reviews.apache.org/r/4717/#comment15448> notifySlaveOfExecutor? src/launcher/launcher.cpp <https://reviews.apache.org/r/4717/#comment15449> What is this supposed to do? src/launcher/vm_mesos_launcher.cpp <https://reviews.apache.org/r/4717/#comment15450> Move to file under src/utils (and combine with launcher.cpp)? src/slave/isolation_module.cpp <https://reviews.apache.org/r/4717/#comment15440> Should this actually work outside Linux? (This also affects src/Makefile.am) src/slave/slave.cpp <https://reviews.apache.org/r/4717/#comment15439> Remove src/slave/vm_isolation_module.cpp <https://reviews.apache.org/r/4717/#comment15457> Is some code missing here? Can this code be shared with the LXC isolation module? src/slave/vm_isolation_module.cpp <https://reviews.apache.org/r/4717/#comment15441> This code (and elsewhere in this file) should be shared with LXC rather than copied. src/slave/vm_isolation_module.cpp <https://reviews.apache.org/r/4717/#comment15443> This path should be configurable (and elsewhere) src/slave/vm_isolation_module.cpp <https://reviews.apache.org/r/4717/#comment15442> LOG(FATAL) src/slave/vm_isolation_module.cpp <https://reviews.apache.org/r/4717/#comment15452> s/task/executor/g src/slave/vm_isolation_module.cpp <https://reviews.apache.org/r/4717/#comment15444> Replace with a more specific directory configuration option (something under work_dir?) -- especially since it needs to be writable. src/slave/vm_isolation_module.cpp <https://reviews.apache.org/r/4717/#comment15455> You need a different way to find the installed location of the utility program (see, e.g., what we do for killtree). src/slave/vm_isolation_module.cpp <https://reviews.apache.org/r/4717/#comment15445> Fix indention (and elsewhere in this function). src/slave/vm_isolation_module.cpp <https://reviews.apache.org/r/4717/#comment15453> Why does this function have a different name than launchVirtualTask()? src/slave/vm_isolation_module.cpp <https://reviews.apache.org/r/4717/#comment15454> Make scp, ssh command configurable (so options, like the location of an identity file, instructions to turn off host key validation, etc. can be passed) src/slave/vm_isolation_module.cpp <https://reviews.apache.org/r/4717/#comment15446> We should not have special cases for spark. - Charles On 2012-04-14 03:57:12, Charles Earl wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4717/ > ----------------------------------------------------------- > > (Updated 2012-04-14 03:57:12) > > > Review request for mesos, Benjamin Hindman and Matei Zaharia. > > > Summary > ------- > > Earlier in the year I implemented a virtual machine isolation module. This > module uses lib-virt to launch and manage virtual machine containers. The > code is still rough and have done basic testing with the Spark example. > > This code works with the KVM (http://www.linux-kvm.org/page/Main_Page) > virtual machine manager. I've placed the relevant code in a branch called > mesos-vm, for now located at https://github.com/charlescearl/VirtualMesos. > The code is based upon the mesos lxc isolation module that is located in > src/slave/lxc_isolation_module.cpp/.hpp. My code based on the mesos master > branch dated Wed Nov 23 12:02:07 2011 -0800, commit > 059aabb2ec5bd7b20ed08ab9c439531a352ba3ec. I've included a patch for the > relevant code included for the review. Suggestions appreciated on whether > this is the appropriate branch/commit to patch against. > > Most of the implementation is contained in vm_isolation_module.cpp and > vm_isolation_module.hpp and there are some minor additions in launcher to > handle setup of the environment for the virtual machine. I use the libvirt > (http://libvirt.org/) library, to manage the virtual machine container in > which the jobs are executed. > > Dependencies > The code has been tested on Ubuntu 11.04 and 11.10 and depends on > libpython2.6 and libvirt0 > > Configuration of the virtual machine container > The virtual machine invocation depends upon a few configuration assumptions: > 1. ssh public keys installed on the container. I assume that the container > is setup to allow password-less secure access. > 2. Directory structure on the container matches the servant machine. For > example, in invoking a spark executor, assume that the paths match the setup > on the container host. > > Running it > In the $MESOS_HOME/conf/mesos.conf file add the line > isolation=vm > to use the virtual machine isolation. > > The Mesos slave is invoked with the isolation parameter set to vm. For > example > sudo bin/mesos-slave -m mesos://master@mesos-host:5050 -w 9839 > --isolation=vm > > Rough description of how it works > > The `vm_isolation_module` class forks a process that in turn launches a > virtual machine. A routine located in bin called find_addr.pl is responsible > for figuring out the IP address of the launched virtual machine. This is > probably not portable since it is explicitly looking for entry in the virbr0 > network. > > A script vmLauncherTemplate.sh located in bin assists the the vmLauncher > method to setup the environment for launching tasks inside of the virtual > machine. The vmLauncher method uses vmLauncherTemplate.sh to create a tasks > specific shell vmLauncherTemplate-<task_id>.sh, which is copied to the > running guest and used to run the executor inside the VM. This communicates > with the slave on the host. > > Comments and suggestions on improvements and next directions are appreciated! > > > Diffs > ----- > > bin/find_addr.pl PRE-CREATION > bin/killtree.sh PRE-CREATION > bin/vmLauncher.sh PRE-CREATION > bin/vmLauncherTemplate.sh PRE-CREATION > src/config/config.hpp PRE-CREATION > src/launcher/launcher.hpp b99b6d2 > src/launcher/launcher.cpp 4422224 > src/launcher/vm_mesos_launcher.cpp PRE-CREATION > src/slave/isolation_module.cpp 5b7b4a2 > src/slave/isolation_module_factory.cpp 6498945 > src/slave/lxc_isolation_module.cpp ab0843a > src/slave/main.cpp 9519ed2 > src/slave/slave.cpp 21fc9f2 > src/slave/vm_isolation_module.hpp PRE-CREATION > src/slave/vm_isolation_module.cpp PRE-CREATION > > Diff: https://reviews.apache.org/r/4717/diff > > > Testing > ------- > > This was run with the spark example on single KVM virtual machine. Not tested > extensively. > > > Thanks, > > Charles > >
