Alberto, would you mind providing slave and master logs (or appropriate parts of them)? Have you specified the --work_dir flag for your Mesos Workers?
On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez <[email protected]> wrote: > Hi Alex, > > Thank you for replying. I managed to fix the first problem but now when I > launch a spark job through my console mesos is losing all the tasks. I can > see them all in my mesos slave but their status is LOST. The stderr & > stdout files of the tasks are both empty. > > Any ideas? > > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov <[email protected]>: > > > Alberto, > > > > What may be happening in your case is that Master is not able to talk to > > your scheduler. When responding to a scheduler, Mesos Master doesn't use > > the IP from which a request came from, but rather an IP set in the > > "Libprocess-from" field instead. That's exactly what you specify in > > LIBPROCESS_IP env var prior starting your scheduler. Could you please > > double check the it set up correctly and that IP is reachable for Mesos > > Master? > > > > In case you are not able to solve the problem, please provide scheduler > and > > Master logs together with master, zookeeper, and scheduler > configurations. > > > > > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <[email protected]> > > wrote: > > > > > Hi all, > > > > > > I managed to get a mesos cluster up & running on a Ubuntu VM. I've > > > been also able to run and connect a spark-shell from this machine and > > > it works properly. > > > > > > Unfortunately, I'm trying to connect from the host machine where the > > > VM is running to launch spark jobs and I can not. > > > > > > See below the spark console output: > > > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java > > > 1.7.0_75) > > > Type in expressions to have them evaluated. > > > Type :help for more information. > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view acls to: > arodriguez > > > 15/05/25 18:13:00 INFO SecurityManager: Changing modify acls to: > > arodriguez > > > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager: > > > authentication disabled; ui acls disabled; users with view > > > permissions: Set(arodriguez); users with modify permissions: > > > Set(arodriguez) > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting > > > 15/05/25 18:13:01 INFO Remoting: Remoting started; listening on > > > addresses :[akka.tcp://[email protected]:47229] > > > 15/05/25 18:13:01 INFO Utils: Successfully started service > > > 'sparkDriver' on port 47229. > > > 15/05/25 18:13:01 INFO SparkEnv: Registering MapOutputTracker > > > 15/05/25 18:13:01 INFO SparkEnv: Registering BlockManagerMaster > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local directory at > > > /tmp/spark-local-20150525181301-7fa8 > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started with capacity > > > 265.4 MB > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load native-hadoop > > > library for your platform... using builtin-java classes where > > > applicable > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server directory is > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16 > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server > > > 15/05/25 18:13:01 INFO Utils: Successfully started service 'HTTP file > > > server' on port 51659. > > > 15/05/25 18:13:01 INFO Utils: Successfully started service 'SparkUI' > > > on port 4040. > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at > > > http://localhost.localdomain:4040 > > > WARNING: Logging before InitGoogleLogging() is written to STDERR > > > W0525 18:13:01.749449 10908 sched.cpp:1323] > > > ************************************************** > > > Scheduler driver bound to loopback interface! Cannot communicate with > > > remote master(s). You might want to set 'LIBPROCESS_IP' environment > > > variable to use a routable IP address. > > > ************************************************** > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@712: > > > Client environment:zookeeper.version=zookeeper C client 3.4.6 > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@716: > > > Client environment:host.name=localhost.localdomain > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@723: > > > Client environment:os.name=Linux > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@724: > > > Client environment:os.arch=3.19.7-200.fc21.x86_64 > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@725: > > > Client environment:os.version=#1 SMP Thu May 7 22:00:21 UTC 2015 > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@733: > > > Client environment:user.name=arodriguez > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1 > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@741: > > > Client environment:user.home=/home/arodriguez > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@753: > > > Client > > > environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init > > @786: > > > Initiating client connection, host=10.141.141.10:2181 > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0 > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0 > > > 2015-05-25 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events > > @1705: > > > initiated connection to server [10.141.141.10:2181] > > > 2015-05-25 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events > > @1752: > > > session establishment complete on server [10.141.141.10:2181], > > > sessionId=0x14d8babef360022, negotiated timeout=10000 > > > I0525 18:13:01.752760 10913 group.cpp:313] Group process > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group operations: > > > queue size (joins, cancels, datas) = (0, 0, 0) > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to create path > > > '/mesos' in ZooKeeper > > > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a new leader: > > > (id='16') > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get > > > '/mesos/info_0000000016' in ZooKeeper > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new leading master > > > ([email protected]:5050) is detected > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master detected at > > > [email protected]:5050 > > > I0525 18:13:01.755345 10911 sched.cpp:264] No credentials provided. > > > Attempting to register without authentication > > > > > > > > > It hangs up in the last line. > > > > > > I've tried to set the LIBPROCESS_IP env variable with no luck. > > > > > > Any advice? > > > > > > Thank you in advance. > > > > > > Kind regards, > > > > > > Alberto > > > > > >
