You're most welcome! Just another thing, then: please be aware that, if you are on a Mac and running Cisco's VPN client, that one messes with VBox's firewall rules for host-only and will cause "baffling behavior" :) (just thought I'd mention, at another place I worked at, it gave us a lot of grief until we found out)
*Marco Massenzio* *Distributed Systems Engineer* On Fri, May 29, 2015 at 12:03 AM, Alberto Rodriguez <[email protected]> wrote: > Hi Marco, > > there is no need to apologize! Thank you very, very much for your detailed > explanation. As you said I tested it out NAT'ing the VMs but it didn't' > work. I'll try to test your solution when I've got some spare time and get > back to the group to let you know guys if your solution work. > > Thank you again! > > 2015-05-29 8:48 GMT+02:00 Marco Massenzio <[email protected]>: > > > Apologies in advance if you already know all this and are an expert on > vbox > > & networking - but maybe this either helps or at least may point you in > the > > right direction (hopefully!) > > > > The problem is most likely to be found in the fact that your laptop (or > > whatever box you're running vbox in) has a hostname that's not > > DNS-resolvable (and probably neither your VMs do). > > > > Further, by default, VBox configures the VM's NICs to be on a 'Bridged' > > private subnet, which means that you can 'net out' (eg, ping google.com > > from the VM) but not get in (eg, run a server accessible from outside the > > VM) > > > > Mesos master/slave need to be able to talk to each other, > bi-directionally, > > which is possibly what was causing the issue in the first place. > > > > NAT'ing the VMs won't probably work either (you won't know in advance > which > > port the Slave will be listening on - I think!) > > > > One option is to configure vbox's VMs to be on their own subnet (I forget > > the exact terminology, it's been almost a year now since I fiddled with > it: > > I think it's the Host-Only option > > <https://www.virtualbox.org/manual/ch06.html#network_hostonly>) but > > essentially vbox will create a subnet and act as a router - the host > > machine will also have a virtual NIC in that subnet, so you'll be able to > > route requests to/from the VMs. > > > > There's also the fact that the Spark driver (pyspark, or spark-submit) > will > > need to be able to talk to the worker nodes, but that should "just work" > > once you get Mesos to work. > > > > HTH, > > > > > > *Marco Massenzio* > > *Distributed Systems Engineer* > > > > On Thu, May 28, 2015 at 11:13 PM, Alberto Rodriguez <[email protected]> > > wrote: > > > > > To be honest I don't know what was the problem. I didn't manage to make > > my > > > Spark jobs work on the mesos cluster running on two virtual machines. I > > > managed to make it work when I run my Spark jobs on my local machine > and > > > both master and mesos slaves are running also in my machine. > > > > > > I guess something is not working properly in the way that virtualbox is > > > assigning their network interfaces to the virtual machines but I can't > > > waste more time in the issue. > > > > > > Thank you again for your help! > > > > > > 2015-05-28 19:28 GMT+02:00 Alex Rukletsov <[email protected]>: > > > > > > > Great! Mind sharing with the list what the problem was (for future > > > > reference)? > > > > > > > > On Thu, May 28, 2015 at 5:25 PM, Alberto Rodriguez < > [email protected]> > > > > wrote: > > > > > > > > > Hi Alex, > > > > > > > > > > I managed to make it work!! Finally I'm running both mesos master > and > > > > slave > > > > > in my laptop and picking up the spark jar from a hdfs installed in > a > > > VM. > > > > > I've just launched an spark job and is working fine! > > > > > > > > > > Thank you very much for your help > > > > > > > > > > 2015-05-28 16:20 GMT+02:00 Alberto Rodriguez <[email protected]>: > > > > > > > > > > > Hi Alex, > > > > > > > > > > > > see following an extract of the chronos log (not sure whether > this > > is > > > > the > > > > > > log you were talking about): > > > > > > > > > > > > 2015-05-28_14:18:28.49322 [2015-05-28 14:18:28,491] INFO No tasks > > > > > > scheduled! Declining offers > > > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106) > > > > > > 2015-05-28_14:18:34.49896 [2015-05-28 14:18:34,497] INFO Received > > > > > resource > > > > > > offers > > > > > > 2015-05-28_14:18:34.49903 > > > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:87) > > > > > > 2015-05-28_14:18:34.50036 [2015-05-28 14:18:34,498] INFO No tasks > > > > > > scheduled! Declining offers > > > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106) > > > > > > 2015-05-28_14:18:40.50442 [2015-05-28 14:18:40,503] INFO Received > > > > > resource > > > > > > offers > > > > > > 2015-05-28_14:18:40.50444 > > > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:87) > > > > > > 2015-05-28_14:18:40.50506 [2015-05-28 14:18:40,503] INFO No tasks > > > > > > scheduled! Declining offers > > > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106) > > > > > > > > > > > > I'm using 0.20.1 because I'm using this vagrant machine: > > > > > > https://github.com/Banno/vagrant-mesos > > > > > > > > > > > > Kind regards and thank you again for your help > > > > > > > > > > > > 2015-05-28 14:09 GMT+02:00 Alex Rukletsov <[email protected]>: > > > > > > > > > > > >> Alberto, > > > > > >> > > > > > >> it looks like Spark scheduler disconnects right after > establishing > > > the > > > > > >> connection. Would you mind sharing scheduler logs as well? Also > I > > > see > > > > > that > > > > > >> you haven't specified the failover_timeout, try setting this > value > > > to > > > > > >> something meaningful (several hours for test purposes). > > > > > >> > > > > > >> And by the way, any reason you're still on Mesos 0.20.1? > > > > > >> > > > > > >> On Wed, May 27, 2015 at 5:32 PM, Alberto Rodriguez < > > > [email protected] > > > > > > > > > > >> wrote: > > > > > >> > > > > > >> > Hi Alex, > > > > > >> > > > > > > >> > I do not know what's going on, now I'm unable to access the > > spark > > > > > >> console > > > > > >> > again, it's hanging up in the same point as before. See > > following > > > > the > > > > > >> > master logs: > > > > > >> > > > > > > >> > 2015-05-27_15:30:53.68764 I0527 15:30:53.687494 944 > > > > master.cpp:3760] > > > > > >> > Sending 1 offers to framework > > > > 20150527-100126-169978048-5050-1851-0001 > > > > > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at > > > > scheduler-be29901f-39ab-4bdf > > > > > >> > [email protected]:32768 > > > > > >> > 2015-05-27_15:30:53.69032 I0527 15:30:53.690196 942 > > > > master.cpp:2273] > > > > > >> > Processing ACCEPT call for offers: [ > > > > > >> > 20150527-152023-169978048-5050-876-O241 ] on slave > > > > > >> > 20150527-152023-169978048-5050-876-S0 at slave(1)@19 > > > > > >> > 2.168.33.11:5051 (mesos-slave1) for framework > > > > > >> > 20150527-100126-169978048-5050-1851-0001 > > > > > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at > > > > > >> > > > > [email protected]:32768 > > > > > >> > 2015-05-27_15:30:53.69038 I0527 15:30:53.690300 942 > > > > > >> hierarchical.hpp:648] > > > > > >> > Recovered mem(*):1024; cpus(*):2; disk(*):33375; > > > > > ports(*):[31000-32000] > > > > > >> > (total allocatable: mem(*):1024; cpus(*):2; disk(*):33375; > port > > > > > >> > s(*):[31000-32000]) on slave > > 20150527-152023-169978048-5050-876-S0 > > > > > from > > > > > >> > framework 20150527-100126-169978048-5050-1851-0001 > > > > > >> > 2015-05-27_15:30:54.00952 I0527 15:30:54.009363 937 > > > > master.cpp:1574] > > > > > >> > Received registration request for framework 'Spark shell' at > > > > > >> > > [email protected]:55562 > > > > > >> > 2015-05-27_15:30:54.00957 I0527 15:30:54.009461 937 > > > > master.cpp:1638] > > > > > >> > Registering framework 20150527-152023-169978048-5050-876-0026 > > > (Spark > > > > > >> shell) > > > > > >> > at > > [email protected]:5556 > > > > > >> > 2 > > > > > >> > 2015-05-27_15:30:54.00994 I0527 15:30:54.009703 937 > > > > > >> hierarchical.hpp:321] > > > > > >> > Added framework 20150527-152023-169978048-5050-876-0026 > > > > > >> > 2015-05-27_15:30:54.00996 I0527 15:30:54.009826 937 > > > > master.cpp:3760] > > > > > >> > Sending 1 offers to framework > > > > 20150527-152023-169978048-5050-876-0026 > > > > > >> > (Spark shell) at > > > > [email protected]. > > > > > >> > 0.1:55562 > > > > > >> > 2015-05-27_15:30:54.01035 I0527 15:30:54.010267 944 > > > > master.cpp:878] > > > > > >> > Framework 20150527-152023-169978048-5050-876-0026 (Spark > shell) > > at > > > > > >> > > [email protected]:55562 > > > > > >> disconnecte > > > > > >> > d > > > > > >> > 2015-05-27_15:30:54.01037 I0527 15:30:54.010308 944 > > > > master.cpp:1948] > > > > > >> > Disconnecting framework > 20150527-152023-169978048-5050-876-0026 > > > > (Spark > > > > > >> > shell) at > > > > [email protected]:55 > > > > > >> > 562 > > > > > >> > 2015-05-27_15:30:54.01038 I0527 15:30:54.010326 944 > > > > master.cpp:1964] > > > > > >> > Deactivating framework 20150527-152023-169978048-5050-876-0026 > > > > (Spark > > > > > >> > shell) at > > > > > [email protected]:555 > > > > > >> > 62 > > > > > >> > 2015-05-27_15:30:54.01053 I0527 15:30:54.010447 939 > > > > > >> hierarchical.hpp:400] > > > > > >> > Deactivated framework 20150527-152023-169978048-5050-876-0026 > > > > > >> > 2015-05-27_15:30:54.01055 I0527 15:30:54.010459 944 > > > > master.cpp:900] > > > > > >> > Giving framework 20150527-152023-169978048-5050-876-0026 > (Spark > > > > shell) > > > > > >> at > > > > > >> > > [email protected]:55562 > > > 0ns > > > > > >> > to failover > > > > > >> > > > > > > >> > > > > > > >> > Kind regards and thank you very much for your help!! > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > 2015-05-27 16:28 GMT+02:00 Alex Rukletsov < > [email protected] > > >: > > > > > >> > > > > > > >> > > Alberto, > > > > > >> > > > > > > > >> > > would you mind providing slave and master logs (or > appropriate > > > > parts > > > > > >> of > > > > > >> > > them)? Have you specified the --work_dir flag for your Mesos > > > > > Workers? > > > > > >> > > > > > > > >> > > On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez < > > > > > [email protected] > > > > > >> > > > > > > >> > > wrote: > > > > > >> > > > > > > > >> > > > Hi Alex, > > > > > >> > > > > > > > > >> > > > Thank you for replying. I managed to fix the first problem > > but > > > > now > > > > > >> > when I > > > > > >> > > > launch a spark job through my console mesos is losing all > > the > > > > > >> tasks. I > > > > > >> > > can > > > > > >> > > > see them all in my mesos slave but their status is LOST. > The > > > > > stderr > > > > > >> & > > > > > >> > > > stdout files of the tasks are both empty. > > > > > >> > > > > > > > > >> > > > Any ideas? > > > > > >> > > > > > > > > >> > > > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov < > > > [email protected] > > > > >: > > > > > >> > > > > > > > > >> > > > > Alberto, > > > > > >> > > > > > > > > > >> > > > > What may be happening in your case is that Master is not > > > able > > > > to > > > > > >> talk > > > > > >> > > to > > > > > >> > > > > your scheduler. When responding to a scheduler, Mesos > > Master > > > > > >> doesn't > > > > > >> > > use > > > > > >> > > > > the IP from which a request came from, but rather an IP > > set > > > in > > > > > the > > > > > >> > > > > "Libprocess-from" field instead. That's exactly what you > > > > specify > > > > > >> in > > > > > >> > > > > LIBPROCESS_IP env var prior starting your scheduler. > Could > > > you > > > > > >> please > > > > > >> > > > > double check the it set up correctly and that IP is > > > reachable > > > > > for > > > > > >> > Mesos > > > > > >> > > > > Master? > > > > > >> > > > > > > > > > >> > > > > In case you are not able to solve the problem, please > > > provide > > > > > >> > scheduler > > > > > >> > > > and > > > > > >> > > > > Master logs together with master, zookeeper, and > scheduler > > > > > >> > > > configurations. > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez < > > > > > >> > [email protected]> > > > > > >> > > > > wrote: > > > > > >> > > > > > > > > > >> > > > > > Hi all, > > > > > >> > > > > > > > > > > >> > > > > > I managed to get a mesos cluster up & running on a > > Ubuntu > > > > VM. > > > > > >> I've > > > > > >> > > > > > been also able to run and connect a spark-shell from > > this > > > > > >> machine > > > > > >> > and > > > > > >> > > > > > it works properly. > > > > > >> > > > > > > > > > > >> > > > > > Unfortunately, I'm trying to connect from the host > > machine > > > > > where > > > > > >> > the > > > > > >> > > > > > VM is running to launch spark jobs and I can not. > > > > > >> > > > > > > > > > > >> > > > > > See below the spark console output: > > > > > >> > > > > > > > > > > >> > > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit > > Server > > > > VM, > > > > > >> Java > > > > > >> > > > > > 1.7.0_75) > > > > > >> > > > > > Type in expressions to have them evaluated. > > > > > >> > > > > > Type :help for more information. > > > > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view > > acls > > > > to: > > > > > >> > > > arodriguez > > > > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing > modify > > > acls > > > > > to: > > > > > >> > > > > arodriguez > > > > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: > SecurityManager: > > > > > >> > > > > > authentication disabled; ui acls disabled; users with > > view > > > > > >> > > > > > permissions: Set(arodriguez); users with modify > > > permissions: > > > > > >> > > > > > Set(arodriguez) > > > > > >> > > > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger > started > > > > > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting > > > > > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Remoting started; > > > listening > > > > > on > > > > > >> > > > > > addresses > :[akka.tcp://[email protected] > > > > > :47229] > > > > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started > > service > > > > > >> > > > > > 'sparkDriver' on port 47229. > > > > > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering > > > > MapOutputTracker > > > > > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering > > > > > BlockManagerMaster > > > > > >> > > > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local > > > > > >> directory at > > > > > >> > > > > > /tmp/spark-local-20150525181301-7fa8 > > > > > >> > > > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore > started > > > with > > > > > >> > capacity > > > > > >> > > > > > 265.4 MB > > > > > >> > > > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to > load > > > > > >> > native-hadoop > > > > > >> > > > > > library for your platform... using builtin-java > classes > > > > where > > > > > >> > > > > > applicable > > > > > >> > > > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File > server > > > > > >> directory > > > > > >> > is > > > > > >> > > > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16 > > > > > >> > > > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP > Server > > > > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started > > service > > > > > 'HTTP > > > > > >> > file > > > > > >> > > > > > server' on port 51659. > > > > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started > > service > > > > > >> > 'SparkUI' > > > > > >> > > > > > on port 4040. > > > > > >> > > > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at > > > > > >> > > > > > http://localhost.localdomain:4040 > > > > > >> > > > > > WARNING: Logging before InitGoogleLogging() is written > > to > > > > > STDERR > > > > > >> > > > > > W0525 18:13:01.749449 10908 sched.cpp:1323] > > > > > >> > > > > > ************************************************** > > > > > >> > > > > > Scheduler driver bound to loopback interface! Cannot > > > > > communicate > > > > > >> > with > > > > > >> > > > > > remote master(s). You might want to set > 'LIBPROCESS_IP' > > > > > >> environment > > > > > >> > > > > > variable to use a routable IP address. > > > > > >> > > > > > ************************************************** > > > > > >> > > > > > 2015-05-25 > > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > > > >> > @712: > > > > > >> > > > > > Client environment:zookeeper.version=zookeeper C > client > > > > 3.4.6 > > > > > >> > > > > > 2015-05-25 > > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > > > >> > @716: > > > > > >> > > > > > Client environment:host.name=localhost.localdomain > > > > > >> > > > > > 2015-05-25 > > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > > > >> > @723: > > > > > >> > > > > > Client environment:os.name=Linux > > > > > >> > > > > > 2015-05-25 > > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > > > >> > @724: > > > > > >> > > > > > Client environment:os.arch=3.19.7-200.fc21.x86_64 > > > > > >> > > > > > 2015-05-25 > > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > > > >> > @725: > > > > > >> > > > > > Client environment:os.version=#1 SMP Thu May 7 > 22:00:21 > > > UTC > > > > > 2015 > > > > > >> > > > > > 2015-05-25 > > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > > > >> > @733: > > > > > >> > > > > > Client environment:user.name=arodriguez > > > > > >> > > > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version: > > 0.22.1 > > > > > >> > > > > > 2015-05-25 > > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > > > >> > @741: > > > > > >> > > > > > Client environment:user.home=/home/arodriguez > > > > > >> > > > > > 2015-05-25 > > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > > > >> > @753: > > > > > >> > > > > > Client > > > > > >> > > > > > > > > > > >> > > > > > > > >> > > > > > environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin > > > > > >> > > > > > 2015-05-25 > > > > > >> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init > > > > > >> > > > > @786: > > > > > >> > > > > > Initiating client connection, host=10.141.141.10:2181 > > > > > >> > > > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 > sessionId=0 > > > > > >> > > > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0 > > > > > >> > > > > > 2015-05-25 > > > > > >> 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events > > > > > >> > > > > @1705: > > > > > >> > > > > > initiated connection to server [10.141.141.10:2181] > > > > > >> > > > > > 2015-05-25 > > > > > >> 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events > > > > > >> > > > > @1752: > > > > > >> > > > > > session establishment complete on server [ > > > > 10.141.141.10:2181 > > > > > ], > > > > > >> > > > > > sessionId=0x14d8babef360022, negotiated timeout=10000 > > > > > >> > > > > > I0525 18:13:01.752760 10913 group.cpp:313] Group > process > > > > > >> > > > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper > > > > > >> > > > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing > group > > > > > >> > operations: > > > > > >> > > > > > queue size (joins, cancels, datas) = (0, 0, 0) > > > > > >> > > > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to > > > create > > > > > path > > > > > >> > > > > > '/mesos' in ZooKeeper > > > > > >> > > > > > I0525 18:13:01.754317 10909 detector.cpp:138] > Detected a > > > new > > > > > >> > leader: > > > > > >> > > > > > (id='16') > > > > > >> > > > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to > get > > > > > >> > > > > > '/mesos/info_0000000016' in ZooKeeper > > > > > >> > > > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new > > > leading > > > > > >> master > > > > > >> > > > > > ([email protected]:5050) is detected > > > > > >> > > > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master > > > > detected > > > > > >> at > > > > > >> > > > > > [email protected]:5050 > > > > > >> > > > > > I0525 18:13:01.755345 10911 sched.cpp:264] No > > credentials > > > > > >> provided. > > > > > >> > > > > > Attempting to register without authentication > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > It hangs up in the last line. > > > > > >> > > > > > > > > > > >> > > > > > I've tried to set the LIBPROCESS_IP env variable with > no > > > > luck. > > > > > >> > > > > > > > > > > >> > > > > > Any advice? > > > > > >> > > > > > > > > > > >> > > > > > Thank you in advance. > > > > > >> > > > > > > > > > > >> > > > > > Kind regards, > > > > > >> > > > > > > > > > > >> > > > > > Alberto > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > >
