[ 
https://issues.apache.org/jira/browse/MESOS-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone resolved MESOS-712.
------------------------------

    Resolution: Duplicate

> invalid zhandle state
> ---------------------
>
>                 Key: MESOS-712
>                 URL: https://issues.apache.org/jira/browse/MESOS-712
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: David Robinson
>
> {noformat:title=log snippet}
> 2013-09-29 
> 08:58:30,445:45279(0x7f9024e3f940):ZOO_WARN@zookeeper_interest@1461: Exceeded 
> deadline by 16533ms
> 2013-09-29 
> 08:58:30,445:45279(0x7f9024e3f940):ZOO_ERROR@handle_socket_error_msg@1528: 
> Socket [192.168.0.1:2181] zk retcode=-7, errno=110(Connection timed out): 
> connection timed out (exceeded timeout by 13199ms)
> I0929 08:58:17.544836 45283 cgroups.cpp:1193] Trying to freeze cgroup 
> /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
> 2013-09-29 08:58:30,474:45279(0x7f9024e3f940):ZOO_DEBUG@handle_error@1141: 
> Calling a watcher for a ZOO_SESSION_EVENT and the state=CONNECTING_STATE
> 2013-09-29 
> 08:58:30,475:45279(0x7f9024e3f940):ZOO_WARN@zookeeper_interest@1461: Exceeded 
> deadline by 16564ms
> 2013-09-29 
> 08:58:30,475:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1765: 
> Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
> I0929 08:58:30.445508 45282 detector.cpp:251] Trying to create path 
> '/home/mesos/prod/master' in ZooKeeper
> 2013-09-29 08:58:30,483:45279(0x7f9024e3f940):ZOO_INFO@check_events@1585: 
> initiated connection to server [192.168.0.2:2181]
> 2013-09-29 08:58:30,488:45279(0x7f9031267940):ZOO_DEBUG@zoo_awexists@2587: 
> Sending request xid=0x5244d598 for path [/home/mesos/prod/master] to 
> 192.168.0.2:2181
> 2013-09-29 
> 08:58:30,488:45279(0x7f9024e3f940):ZOO_ERROR@handle_socket_error_msg@1621: 
> Socket [192.168.0.2:2181] zk retcode=-112, errno=116(Stale NFS file handle): 
> sessionId=0x340523200364932 has expired.
> 2013-09-29 08:58:30,489:45279(0x7f9024e3f940):ZOO_DEBUG@handle_error@1138: 
> Calling a watcher for a ZOO_SESSION_EVENT and the 
> state=ZOO_EXPIRED_SESSION_STATE
> 2013-09-29 08:58:30,489:45279(0x7f9024e3f940):ZOO_DEBUG@do_io@317: IO thread 
> terminated
> 2013-09-29 
> 08:58:30,489:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1765: 
> Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
> 2013-09-29 
> 08:58:30,489:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1784: 
> Calling COMPLETION_STAT for xid=0x5244d598 rc=-112
> I0929 08:58:30.475751 45283 cgroups.cpp:1232] Successfully froze cgroup 
> /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
>  after 1 attempts
> F0929 08:58:30.492090 45282 detector.cpp:266] Failed to create 
> '/home/mesos/prod/master' in ZooKeeper: invalid zhandle state
> *** Check failure stack trace: ***
> I0929 08:58:30.492761 45292 cgroups.cpp:1208] Trying to thaw cgroup 
> /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
> I0929 08:58:31.144810 45291 cgroups_isolator.cpp:937] Executor 
> thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of 
> framework 201205082337-0000000003-0000 terminated with status 9
> I0929 08:58:32.791193 45292 cgroups.cpp:1318] Successfully thawed 
> /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
> I0929 08:58:33.675348 45298 cgroups_isolator.cpp:1275] Successfully destroyed 
> cgroup 
> mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
> I0929 08:58:33.676269 45300 slave.cpp:2158] Executor 
> 'thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f' of 
> framework 201205082337-0000000003-0000 has terminated with signal Killed
> I0929 08:58:33.678154 45300 slave.cpp:1778] Handling status update 
> TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 
> 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 
> 201205082337-0000000003-0000 from @0.0.0.0:0
> I0929 08:58:33.679175 45288 cgroups_isolator.cpp:700] Asked to update 
> resources for an unknown/killed executor
> I0929 08:58:33.679201 45300 status_update_manager.cpp:300] Received status 
> update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 
> 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 
> 201205082337-0000000003-0000 
> I0929 08:58:33.680452 45300 status_update_manager.hpp:337] Checkpointing 
> UPDATE for status update TASK_FAILED (UUID: 
> 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 
> 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 
> 201205082337-0000000003-0000 
>     @     0x7f9035fb562d  google::LogMessage::Fail()
>     @     0x7f9035fb9617  google::LogMessage::SendToLog()
>     @     0x7f9035fb7f14  google::LogMessage::Flush()
> I0929 08:58:35.929435 45300 status_update_manager.cpp:351] Forwarding status 
> update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 
> 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f of framework 
> 201205082337-0000000003-0000 to [email protected]:5050
>     @     0x7f9035fb8146  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f9035d1a83f  
> mesos::internal::ZooKeeperMasterDetectorProcess::connected()
>     @     0x7f9035d1f118  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7f9035d21b84  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7f9035ea6f84  process::ProcessManager::resume()
>     @     0x7f9035ea79df  process::schedule()
>     @     0x7f903561083d  start_thread
>     @     0x7f9033ff2f8d  clone
> {noformat}
> slave exited w/ SIGABRT. Zookeeper connection issue? Should Mesos handle this 
> gracefully?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to