One other idea... could the use of Cgroups or not using cgroups affect this? I.e. I wonder if it worked for me when I was messing with CGRoups settings, if I used Cgroups could it potentially act as a CHROOT on the sandbox so from the perspective of the executor it is running in the root with the proper permissions.
Also, if we are bundling everything there, if CGROUPS isn't the answer, could something like CHROOT help us to trick Yarn into thinking permissions are fine all the way to root? (Once again brainstorming) On Wed, Nov 18, 2015 at 7:56 AM, John Omernik <[email protected]> wrote: > I understand that Hadoop (yarn) requires those permission. My concern > still stands in that I had it running at one point without this issue. I > can't reproduce it now, and trying to figure that out (so at some point, > the permissions I had set seemed to allow it to run, without changing the > slaves) unless it was a mirage, or some other quirk, it was running. Can > others confirm that they have to change their Mesos setup in order to run > Myriad? This seems very odd to me in that A. If the permissions are wrong > to the point where we can't run the framework Mesos running with default > settings. (Writing to /tmp is the standard location for Mesos, and there > have been no changes to /tmp permissions, the permissions are drwxrwxrwt > root:root just like a standard install of Ubuntu 14.04) B. Everyone here > who has run Myriad have thus changed their default settings on Mesos and C. > That this change of Mesos isn't in the documentation. > > In my environment that I believe that it "ran" at some point (something I > did on executor tgz permissions helped it to work) combined with something > so big (requiring a change on every Mesos slave in your cluster) and > critical to even getting Myriad (i.e. Yarn won't run in the default Mesos > setup) to run isn't in the documentation of getting Myriad to run makes me > question whether there is no work around to this. > > So let me ask the group > > 1. Did everyone here make the change in Mesos to get Myriad to run? If > Myriad is running in your environment (it doesn't matter if you are running > MapR or not) and you have not changed your default /tmp location for Slave > Sandboxes, please share your /tmp ownership and permissions and let us know > if it runs. If it did require a change, please let us know what you changed > it to, and if it affected any other frameworks. (That is my largest > concern in that changing permissions on this location is not a Myriad only > change) > > 2. If everyone DID make this change, then we NEED to get this documented > because this will be a huge speed bump in people trying it out and getting > it running. > > Yuliya, I am not trying to be a pain, but as a user this is a very strange > to me that something so fundamental to the operation is not clear in the > documentation. I just want to ensure we (probably mostly me) understands > this completely before I go and make changes to every node in my mesos > cluster. > > On Tue, Nov 17, 2015 at 4:44 PM, yuliya Feldman < > [email protected]> wrote: > >> Hadoop (not Mapr) requires whole path starting from "/" be owned by root >> and writable only by root >> The second problem is exactly what I was talking about configuration >> being taken from RM that overwrites local one >> I can give you a patch to mitigate the issue for Mapr if you are building >> from source. >> Thanks,Yuliya >> From: John Omernik <[email protected]> >> To: [email protected] >> Sent: Tuesday, November 17, 2015 1:15 PM >> Subject: Re: Struggling with Permissions >> >> Well sure /tmp is world writeable but /tmp/mesos is not world writable >> thus >> there is a sandbox to play in there... or am I missing something. Not to >> mention my tmp is rwt which is world writable but only the creator or root >> can modify (based on the googles). >> Yuliya: >> >> I am seeing a weird behavior with MapR as it relates to (I believe) the >> mapr_direct_shuffle. >> >> In the Node Manager logs, I see things starting and it saying "Checking >> for >> local volume, if local volume is not present command will create and mount >> it" >> >> Command invoked is : /opt/mapr/server/createTTVolume.sh >> hadoopmapr7.brewingintel.com /var/mapr/local/ >> hadoopmapr2.brewingintel.com/mapred /var/mapr/local/ >> hadoopmapr2.brewingintel.com/mapred/nodeManager yarn >> >> >> What is interesting here is hadoopmapr7 is the nodemanager it's trying to >> start on, however the mount point it's trying to create is hadoopmapr2 >> which is the node the resource manager happened to fall on... I was very >> confused by that because in no place should hadoopmapr2 be "known" to the >> nodemanager, because it thinks the resource manager hostname is >> myriad.marathon.mesos. >> >> So why was it hard coding to the node the resource manager is running on? >> >> Well if I look at the conf file in the sandbox (the file that gets copied >> to be yarn-site.xml for node managers. There ARE four references the >> hadoopmapr2. Three of the four say "source programatically" and one is >> just >> set... that's mapr.host. Could there be some down stream hinkyness going >> on with how MapR is setting hostnames? All of these variables seem >> "wrong" >> in that mapr.host (on the node manager) should be hadoopmapr7 in this >> case, >> and the resource managers should all be myriad.marathon.mesos. I'd be >> interested in your thoughts here, because I am stumped at how these are >> getting set. >> >> >> >> >> <property><name>yarn.resourcemanager.address</name><value>hadoopmapr2:8032</value><source>programatically</source></property> >> <property><name>mapr.host</name><value>hadoopmapr2.brewingintel.com >> </value></property> >> >> <property><name>yarn.resourcemanager.resource-tracker.address</name><value>hadoopmapr2:8031</value><source>programatically</source></property> >> >> <property><name>yarn.resourcemanager.admin.address</name><value>hadoopmapr2:8033</value><source>programatically</source></property> >> >> >> >> >> >> >> >> On Tue, Nov 17, 2015 at 2:51 PM, Darin Johnson <[email protected]> >> wrote: >> >> > Yuliya: Are you referencing yarn.nodemanager.hostname or a mapr specific >> > option? >> > >> > I'm working right now on passing a >> > -Dyarn.nodemanager.hostname=offer.getHostName(). Useful if you've got >> > extra ip's for a san or management network. >> > >> > John: Yeah the permissions on the tarball are a pain to get right. I'm >> > working on Docker Support and a build script for the tarball, which >> should >> > make things easier. Also, to the point of using world writable >> directories >> > it's a little scary from the security side of things to allow >> executables >> > to run there, especially things running as privileged users. Many >> distro's >> > of linux will mount /tmp noexec. >> > >> > Darin >> > >> > On Tue, Nov 17, 2015 at 2:53 PM, yuliya Feldman >> > <[email protected] >> > > wrote: >> > >> > > Please change workdir directory for mesos slave to one that is not >> /tmp >> > > and make sure that dir is owned by root. >> > > There is one more caveat with binary distro and MapR - in Myriad code >> for >> > > binary distro configuration is copied from RM to NMs - it doe snot >> work >> > for >> > > MapR since we need hostname (yes for the sake of local volumes) to be >> > > unique. >> > > MapR will have Myriad release to handle this situation. >> > > From: John Omernik <[email protected]> >> > > To: [email protected] >> > > Sent: Tuesday, November 17, 2015 11:37 AM >> > > Subject: Re: Struggling with Permissions >> > > >> > > Oh hey, I found a post by me back on Sept 9. I looked at the Jiras >> and >> > > followed the instructions with the same errors. At this point do I >> still >> > > need to have a place where the entire path is owned by root? That >> seems >> > > like a an odd requirement (a changed of each node to facilitate a >> > > framework) >> > > >> > > >> > > >> > > >> > > >> > > On Tue, Nov 17, 2015 at 1:25 PM, John Omernik <[email protected]> >> wrote: >> > > >> > > > Hey all, I am struggling with permissions on myriad, trying to get >> the >> > > > right permissions in the tgz as well as who to run as. I am >> running in >> > > > MapR, which means I need to run as mapr or root (otherwise my volume >> > > > creation scripts will fail on MapR, MapR folks, we should talk more >> > about >> > > > those scripts) >> > > > >> > > > But back to the code, I've had lots issues. When I run the >> > Frameworkuser >> > > > and Superuser as mapr, it unpacks everything as MapR and I get a >> > > > "/bin/container-executor" must be owned by root but is owned by 700 >> (my >> > > > mapr UID). >> > > > >> > > > So now I am running as root, and I am getting the error below as it >> > > > relates to /tmp. I am not sure which /tmp this refers to. the /tmp >> that >> > > my >> > > > slave is executing in? (i.e. my local mesos agent /tmp directory) >> or my >> > > > MaprFS /tmp directory (both of which are world writable, as /tmp >> > > typically >> > > > is... or am I mistaken here?) >> > > > >> > > > Any thoughts on how to get this to resolve? This is when >> nodemanager is >> > > > trying to start running as root and root for both of my Myriad >> users. >> > > > >> > > > Thanks! >> > > > >> > > > >> > > > Caused by: ExitCodeException exitCode=24: File /tmp must not be >> world >> > or >> > > group writable, but is 1777 >> > > > >> > > > >> > > > >> > > > >> > > >> > > >> > > >> > > >> > >> >> >> >> > >
