Github user rmetzger commented on the pull request:
https://github.com/apache/flink/pull/948#issuecomment-148403603
I tried running the code from this pull request again, this time using the
`mesos-playa` vagrant image, and it does not work for me.
I was following your instructions.
When did you test the changes recently?
My motivation to test this pull request goes down every time I'm testing
it. I've spun up a Mesos cluster on GCE two times, plus the VM now.
Maybe I'm doing it wrong, please let me know what I can do to get it to run.
CLI output:
```
vagrant@mesos:~/flink/build-target$ java
-Dlog4j.configuration=file://`pwd`/conf/log4j.properties -Dlog.file=logs.log
-cp lib/flink-dist-0.10-SNAPSHOT.jar
org.apache.flink.mesos.scheduler.FlinkScheduler --confDir conf/
I1015 14:05:01.591161 9992 sched.cpp:157] Version: 0.22.1
2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@712: Client
environment:zookeeper.version=zookeeper C client 3.4.5
2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@716: Client
environment:host.name=mesos
2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@723: Client
environment:os.name=Linux
2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@724: Client
environment:os.arch=3.16.0-30-generic
2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@725: Client
environment:os.version=#40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015
2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@733: Client
environment:user.name=vagrant
2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@741: Client
environment:user.home=/home/vagrant
2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@log_env@753: Client
environment:user.dir=/home/vagrant/flink/flink-dist/target/flink-0.10-SNAPSHOT-bin/flink-0.10-SNAPSHOT
2015-10-15 14:05:01,592:9991(0x7f67cffff700):ZOO_INFO@zookeeper_init@786:
Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000
watcher=0x7f67dac33a60 sessionId=0 sessionPasswd=<null> context=0x7f67f0004470
flags=0
2015-10-15 14:05:01,592:9991(0x7f67c6ffd700):ZOO_INFO@check_events@1703:
initiated connection to server [127.0.0.1:2181]
Embedded server listening at
http://127.0.0.1:40815
Press any key to stop.
2015-10-15 14:05:04,959:9991(0x7f67c6ffd700):ZOO_INFO@check_events@1750:
session establishment complete on server [127.0.0.1:2181],
sessionId=0x1506b6312fa000b, negotiated timeout=10000
I1015 14:05:04.959841 10024 group.cpp:313] Group process
(group(1)@127.0.1.1:57437) connected to ZooKeeper
I1015 14:05:04.959899 10024 group.cpp:790] Syncing group operations: queue
size (joins, cancels, datas) = (0, 0, 0)
I1015 14:05:04.959928 10024 group.cpp:385] Trying to create path '/mesos'
in ZooKeeper
I1015 14:05:05.204282 10024 detector.cpp:138] Detected a new leader:
(id='2')
I1015 14:05:05.204489 10024 group.cpp:659] Trying to get
'/mesos/info_0000000002' in ZooKeeper
I1015 14:05:05.303072 10024 detector.cpp:452] A new leading master
([email protected]:5050) is detected
I1015 14:05:05.303467 10024 sched.cpp:254] New master detected at
[email protected]:5050
I1015 14:05:05.303890 10024 sched.cpp:264] No credentials provided.
Attempting to register without authentication
I1015 14:05:05.851562 10024 sched.cpp:448] Framework registered with
20151015-120419-16842879-5050-1244-0000
```
log file content
```
14:04:54,564 WARN org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
14:04:55,763 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
-
--------------------------------------------------------------------------------
14:04:55,763 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Starting JobManager (Version: 0.10-SNAPSHOT, Rev:d905af0,
Date:06.10.2015 @ 19:37:22 UTC)
14:04:55,763 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Current user: vagrant
14:04:55,763 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.79-b02
14:04:55,763 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Maximum heap size: 592 MiBytes
14:04:55,763 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- JAVA_HOME: (not set)
14:04:55,823 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Hadoop version: 2.3.0
14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- JVM Options:
14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
-
-Dlog4j.configuration=file:///home/vagrant/flink/build-target/conf/log4j.properties
14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- -Dlog.file=logs.log
14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Program Arguments:
14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- --confDir
14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- conf/
14:04:55,824 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
-
--------------------------------------------------------------------------------
14:04:55,875 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Maximum number of open file descriptors is 4096
14:04:55,875 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Loading configuration from
/home/vagrant/flink/flink-dist/target/flink-0.10-SNAPSHOT-bin/flink-0.10-SNAPSHOT/conf
14:04:58,375 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManager
14:04:58,377 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManager actor system at localhost:6123.
14:04:59,700 INFO org.eclipse.jetty.util.log
- jetty-0.10-SNAPSHOT
14:05:01,985 INFO org.eclipse.jetty.util.log
- Started [email protected]:40815
14:05:07,698 INFO akka.event.slf4j.Slf4jLogger
- Slf4jLogger started
14:05:07,750 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Accepting
14:05:07,960 INFO Remoting
- Starting remoting
14:05:09,241 INFO Remoting
- Remoting started; listening on addresses
:[akka.tcp://[email protected]:6123]
14:05:09,248 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManager actor
14:05:09,597 INFO org.apache.flink.runtime.blob.BlobServer
- Created BLOB server storage directory
/tmp/blobStore-9b7614f7-7d0d-4c5e-b4c6-911f0ab845ef
14:05:09,597 INFO org.apache.flink.runtime.blob.BlobServer
- Started BLOB server at 0.0.0.0:40000 - max concurrent requests: 50 - max
backlog: 1000
14:05:10,470 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManager at akka.tcp://[email protected]:6123/user/jobmanager.
14:05:10,471 INFO org.apache.flink.runtime.jobmanager.MemoryArchivist
- Started memory archivist akka://flink/user/archive
14:05:10,563 INFO org.apache.flink.runtime.jobmanager.JobManager
- JobManager akka.tcp://[email protected]:6123/user/jobmanager was granted
leadership with leader session ID None.
14:05:10,593 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManger web frontend
14:05:10,735 INFO org.apache.flink.runtime.jobmanager.web.WebInfoServer
- Setting up web info server, using web-root directory
jar:file:/home/vagrant/flink/flink-dist/target/flink-0.10-SNAPSHOT-bin/flink-0.10-SNAPSHOT/lib/flink-dist-0.10-SNAPSHOT.jar!/web-docs-infoserver.
14:05:11,162 INFO org.eclipse.jetty.util.log
- jetty-0.10-SNAPSHOT
14:05:11,165 INFO org.eclipse.jetty.util.log
- Started [email protected]:8081
14:05:11,166 INFO org.apache.flink.runtime.jobmanager.web.WebInfoServer
- Started web info server for JobManager on 0.0.0.0:8081
14:05:14,936 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Declining offer(s) from slave 20151015-120419-16842879-5050-1244-S0
offered [cpus: 1.5 | mem : 488.0 | disk: 33044.0] required [cpus: 0.5 | mem:
512.0 | disk: 1024.0]
14:05:15,948 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- statusUpdate received from taskId: TaskManager_1 slaveId:
20151015-120419-16842879-5050-1244-S0 [TASK_LOST]
14:05:15,948 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Lost taskManager with TaskId: TaskManager_1 on slave:
20151015-120419-16842879-5050-1244-S0
14:05:16,939 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Accepting
14:05:17,092 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- statusUpdate received from taskId: TaskManager_2 slaveId:
20151015-120419-16842879-5050-1244-S0 [TASK_LOST]
14:05:17,092 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Lost taskManager with TaskId: TaskManager_2 on slave:
20151015-120419-16842879-5050-1244-S0
14:05:17,939 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Accepting
14:05:18,096 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- statusUpdate received from taskId: TaskManager_3 slaveId:
20151015-120419-16842879-5050-1244-S0 [TASK_LOST]
14:05:18,096 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Lost taskManager with TaskId: TaskManager_3 on slave:
20151015-120419-16842879-5050-1244-S0
14:05:18,940 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Accepting
14:05:19,112 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- statusUpdate received from taskId: TaskManager_4 slaveId:
20151015-120419-16842879-5050-1244-S0 [TASK_LOST]
14:05:19,113 INFO org.apache.flink.mesos.scheduler.FlinkScheduler$
- Lost taskManager with TaskId: TaskManager_4 on slave:
20151015-120419-16842879-5050-1244-S0
.... this goes on forever? ...
```
mesos file `mesos-slave.WARNING`:
```
Log file created at: 2015/10/15 12:04:40
Running on machine: mesos
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W1015 12:04:40.464870 1310 slave.cpp:1934] Ignoring updating pid for
framework 20151007-005549-16842879-5050-1191-0001 because it does not exist
W1015 12:05:08.030145 1313 slave.cpp:1934] Ignoring updating pid for
framework 20151007-005549-16842879-5050-1191-0000 because it does not exist
E1015 14:05:14.378486 1312 slave.cpp:3112] Container
'74dc3694-16ec-470f-88c6-b06b7f295682' for executor 'executor_1' of framework
'20151015-120419-16842879-5050-1244-0000' failed to start: Failed to fetch URIs
for container '74dc3694-16ec-470f-88c6-b06b7f295682'with exit status: 256
E1015 14:05:15.768391 1315 slave.cpp:3461] Failed to unmonitor container
for executor executor_1 of framework 20151015-120419-16842879-5050-1244-0000:
Not monitored
W1015 14:05:15.851459 1312 containerizer.cpp:814] Ignoring update for
unknown container: 74dc3694-16ec-470f-88c6-b06b7f295682
E1015 14:05:16.989680 1307 slave.cpp:3112] Container
'2af2d3c0-e30c-4405-9ff1-7f4389bb62e9' for executor 'executor_2' of framework
'20151015-120419-16842879-5050-1244-0000' failed to start: Failed to fetch URIs
for container '2af2d3c0-e30c-4405-9ff1-7f4389bb62e9'with exit status: 256
E1015 14:05:17.090631 1312 slave.cpp:3461] Failed to unmonitor container
for executor executor_2 of framework 20151015-120419-16842879-5050-1244-0000:
Not monitored
W1015 14:05:17.091418 1305 containerizer.cpp:814] Ignoring update for
unknown container: 2af2d3c0-e30c-4405-9ff1-7f4389bb62e9
E1015 14:05:17.993669 1310 slave.cpp:3112] Container
'8cbc46f8-3200-4f9b-9134-099a0f6f3541' for executor 'executor_3' of framework
'20151015-120419-16842879-5050-1244-0000' failed to start: Failed to fetch URIs
for container '8cbc46f8-3200-4f9b-9134-099a0f6f3541'with exit status: 256
E1015 14:05:18.095177 1310 slave.cpp:3461] Failed to unmonitor container
for executor executor_3 of framework 20151015-120419-16842879-5050-1244-0000:
Not monitored
W1015 14:05:18.095211 1310 containerizer.cpp:814] Ignoring update for
unknown container: 8cbc46f8-3200-4f9b-9134-099a0f6f3541
E1015 14:05:19.006584 1305 slave.cpp:3112] Container
'aca9e80a-5a34-4c29-a123-f025dc4946fe' for executor 'executor_4' of framework
'20151015-120419-16842879-5050-1244-0000' failed to start: Failed to fetch URIs
for container 'aca9e80a-5a34-4c29-a123-f025dc4946fe'with exit status: 256
```
I can not find any log files for the taskamanger
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---