David Robinson created MESOS-712:
------------------------------------
Summary: invalid zhandle state
Key: MESOS-712
URL: https://issues.apache.org/jira/browse/MESOS-712
Project: Mesos
Issue Type: Bug
Affects Versions: 0.14.0
Reporter: David Robinson
{noformat:title=log snippet}
2013-09-29 08:58:30,445:45279(0x7f9024e3f940):ZOO_WARN@zookeeper_interest@1461:
Exceeded deadline by 16533ms
2013-09-29
08:58:30,445:45279(0x7f9024e3f940):ZOO_ERROR@handle_socket_error_msg@1528:
Socket [192.168.0.1:2181] zk retcode=-7, errno=110(Connection timed out):
connection timed out (exceeded timeout by 13199ms)
I0929 08:58:17.544836 45283 cgroups.cpp:1193] Trying to freeze cgroup
/cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
2013-09-29 08:58:30,474:45279(0x7f9024e3f940):ZOO_DEBUG@handle_error@1141:
Calling a watcher for a ZOO_SESSION_EVENT and the state=CONNECTING_STATE
2013-09-29 08:58:30,475:45279(0x7f9024e3f940):ZOO_WARN@zookeeper_interest@1461:
Exceeded deadline by 16564ms
2013-09-29
08:58:30,475:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1765: Calling
a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
I0929 08:58:30.445508 45282 detector.cpp:251] Trying to create path
'/home/mesos/prod/master' in ZooKeeper
2013-09-29 08:58:30,483:45279(0x7f9024e3f940):ZOO_INFO@check_events@1585:
initiated connection to server [192.168.0.2:2181]
2013-09-29 08:58:30,488:45279(0x7f9031267940):ZOO_DEBUG@zoo_awexists@2587:
Sending request xid=0x5244d598 for path [/home/mesos/prod/master] to
192.168.0.2:2181
2013-09-29
08:58:30,488:45279(0x7f9024e3f940):ZOO_ERROR@handle_socket_error_msg@1621:
Socket [192.168.0.2:2181] zk retcode=-112, errno=116(Stale NFS file handle):
sessionId=0x340523200364932 has expired.
2013-09-29 08:58:30,489:45279(0x7f9024e3f940):ZOO_DEBUG@handle_error@1138:
Calling a watcher for a ZOO_SESSION_EVENT and the
state=ZOO_EXPIRED_SESSION_STATE
2013-09-29 08:58:30,489:45279(0x7f9024e3f940):ZOO_DEBUG@do_io@317: IO thread
terminated
2013-09-29
08:58:30,489:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1765: Calling
a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
2013-09-29
08:58:30,489:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1784: Calling
COMPLETION_STAT for xid=0x5244d598 rc=-112
I0929 08:58:30.475751 45283 cgroups.cpp:1232] Successfully froze cgroup
/cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
after 1 attempts
F0929 08:58:30.492090 45282 detector.cpp:266] Failed to create
'/home/mesos/prod/master' in ZooKeeper: invalid zhandle state
*** Check failure stack trace: ***
I0929 08:58:30.492761 45292 cgroups.cpp:1208] Trying to thaw cgroup
/cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
I0929 08:58:31.144810 45291 cgroups_isolator.cpp:937] Executor
thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of
framework 201205082337-0000000003-0000 terminated with status 9
I0929 08:58:32.791193 45292 cgroups.cpp:1318] Successfully thawed
/cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
I0929 08:58:33.675348 45298 cgroups_isolator.cpp:1275] Successfully destroyed
cgroup
mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
I0929 08:58:33.676269 45300 slave.cpp:2158] Executor
'thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f' of
framework 201205082337-0000000003-0000 has terminated with signal Killed
I0929 08:58:33.678154 45300 slave.cpp:1778] Handling status update TASK_FAILED
(UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task
1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework
201205082337-0000000003-0000 from @0.0.0.0:0
I0929 08:58:33.679175 45288 cgroups_isolator.cpp:700] Asked to update resources
for an unknown/killed executor
I0929 08:58:33.679201 45300 status_update_manager.cpp:300] Received status
update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task
1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework
201205082337-0000000003-0000
I0929 08:58:33.680452 45300 status_update_manager.hpp:337] Checkpointing UPDATE
for status update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for
task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of
framework 201205082337-0000000003-0000
@ 0x7f9035fb562d google::LogMessage::Fail()
@ 0x7f9035fb9617 google::LogMessage::SendToLog()
@ 0x7f9035fb7f14 google::LogMessage::Flush()
I0929 08:58:35.929435 45300 status_update_manager.cpp:351] Forwarding status
update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task
1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework
201205082337-0000000003-0000 to [email protected]:5050
@ 0x7f9035fb8146 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9035d1a83f
mesos::internal::ZooKeeperMasterDetectorProcess::connected()
@ 0x7f9035d1f118 std::tr1::_Function_handler<>::_M_invoke()
@ 0x7f9035d21b84 std::tr1::_Function_handler<>::_M_invoke()
@ 0x7f9035ea6f84 process::ProcessManager::resume()
@ 0x7f9035ea79df process::schedule()
@ 0x7f903561083d start_thread
@ 0x7f9033ff2f8d clone
{noformat}
slave exited w/ SIGABRT. Zookeeper connection issue? Should Mesos handle this
gracefully?
--
This message was sent by Atlassian JIRA
(v6.1#6144)