>Apologies for the unusual post but in the absence of a user list I am >sort of left without other options. :-)
No worries. We hope to create a user@ soon after we see an increase in emails like this to dev@. So, welcome and keep them coming! >I have been keeping an eye on Myriad as we run a number of Hadoop >nodes next to a bunch of some friendly use-cases and currently >struggle to manage resourcing boundaries. Glad to hear (again) that this is precisely the problem Myriad had set out to solve. >How would a MapR NM container connect back to Yarn without running >MFS inside the container? Currently NM doesn't run inside a (docker) container (both with and without Myriad). But the plan is to get there in the future to support multi tenancy. If you can, please share your story on why you'd want to run NM inside a container. It helps to work together and work on the right problems. In this context, I'd also like to inform you that Swapnil, Sarjeet and Mitra from MapR started working on this problem (and won1st prize) at Docker Hack Day event. Check out: https://blog.docker.com/2015/09/docker-global-hack-day-3-winners/ > Failure to do so (using the default config) will usually (always?) >result a failure of the NM to start. Always. You are spot on. > I imagine this behaviour may be worked around but I could not find any documentation about it. Currently MapR doesn't support running NM inside containers. The createTTVolume.sh script is coded to be invoked on the same host as the MFS. So, when the NM runs inside the container, the hostname of the container doesn't match with the hostname of the physical host (unless you use docker's HOST networking mode) and causes the script to fail. Recently, a couple of us did an experiment with it. We modified the script to pick up the hostname from a env variable (as opposed to `hostname --fqdn`) and passed the host's hostname to the env variable while launching the container. With that we were able to run NM inside the container and have it create a local volume on the MFS running on the host. Still, these are just experiments and a lot more thought needs to be given to make it work in a containerized environment. For e.g. what about running multiple such NM containers per host? > If that is not the case, this will have some interesting consequences > on MapR users adopting Myriad. Can you please elaborate more on this? Currently Myriad doesn't launch NMs inside docker containers. Myriad launches NMs via Mesos, but as physical processes. >So far so good. But when you take this with the >com.mapr.hadoop.mapred.LocalVolumeAuxService behaviour mentioned >above, one >would end up having to run MFS on both the baremetal slave and within >the container it is hosting! I think a cleaner solution would be to not enforce running MFS inside the containers. Trying to do that has other consequences, esp. around the need to pass the host's disks into docker containers for MFS to format. The right way to do this is to modify createTTVolume.sh to be container aware. On Fri, Oct 30, 2015 at 6:45 AM, Andre <[email protected]> wrote: > Hi there, > > Apologies for the unusual post but in the absence of a user list I am > sort of left without other options. :-) > > I have been keeping an eye on Myriad as we run a number of Hadoop > nodes next to a bunch of some Mesos friendly use-cases and currently > struggle to manage resourcing boundaries. > > > I have asked this question to a number of different individuals at > MapR but never managed to get a clear answer, therefore apologies for > the somehow MapR focused question but.... > > How would a MapR NM container connect back to Yarn without running > MFS inside the container? > > Reason I ask is simple: > > By default MapR NM nodes depend on MFS in order to create a local > volume. This applies also to compute nodes and is explicitly > documented behaviour and seen on the logs on entries similar to the > following: > > > 2015-09-10 11:11:13,794 INFO > com.mapr.hadoop.mapred.LocalVolumeAuxService: Checking for local > volume. If volume is not present command will create and mount it. > > > Failure to do so (using the default config) will usually (always?) > result a failure of the NM to start. > > I imagine this behaviour may be worked around but I could not find any > documentation about it. > > > > If that is not the case, this will have some interesting consequences > on MapR users adopting Myriad. > > Taking for example users opting for following an MFS backed Mesos > topology like the one described in here: > > https://www.mapr.com/blog/my-experience-running-docker-containers-on-mesos > https://www.mapr.com/solutions/zeta-enterprise-architecture > > My understanding is that a good number of baremetal servers would be > running MFS and ensuring that resources (disk controllers and disk > slots in this case) are utilised to their maximum potential. > > So far so good. But when you take this with the > com.mapr.hadoop.mapred.LocalVolumeAuxService behaviour mentioned > above, one > would end up having to run MFS on both the baremetal slave and within > the container it is hosting! > > This raises a few interesting questions: > > * How to deal with MFS service and its memory gluttony? > Caching would potentially be happening on both baremetal and > containers... > > * Both barebone and contained would have to contact the CLDB in order > to function > well... Lets say this would likely cause MapR share prices > would go up. :-) > > > > Keen to understand from the MapR commiters what is the view / work > around createTTvolume.sh > > Kind regards > > Andre >
