[jira] [Closed] (AURORA-1786) -zk_session_timeout option does not work

David Robinson (JIRA) Thu, 29 Sep 2016 18:01:36 -0700

     [ 
https://issues.apache.org/jira/browse/AURORA-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Robinson closed AURORA-1786.
----------------------------------
    Resolution: Not A Problem

Actually, looks like there's a maximum the server will accept.

{quote}
One of the parameters to the ZooKeeper client library call to create a 
ZooKeeper session is the session timeout in milliseconds. The client sends a 
requested timeout, the server responds with the timeout that it can give the 
client. The current implementation requires that the timeout be a minimum of 2 
times the tickTime (as set in the server configuration) and a maximum of 20 
times the tickTime. The ZooKeeper client API allows access to the negotiated 
timeout.
{quote}

https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkSessions

> -zk_session_timeout option does not work
> ----------------------------------------
>
>                 Key: AURORA-1786
>                 URL: https://issues.apache.org/jira/browse/AURORA-1786
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: David Robinson
>
> Looks like the -zk_session_timeout option has no affect. I've set 
> -zk_session_timeout="60mins" to attempt to work around ZK session timeouts 
> (due to GC pauses caused by TaskHistoryPruner pruning a huge number of 
> inactive tasks), but the default 30 seconds seems to always be used.
> {noformat}
> I0929 22:36:10.804 [main, ArgScanner:411] zk_chroot_path: null 
> I0929 22:36:10.804 [main, ArgScanner:411] zk_digest_credentials: xxxx:xxxx 
> I0929 22:36:10.805 [main, ArgScanner:411] zk_endpoints: [zk.example.com:2181] 
> I0929 22:36:10.805 [main, ArgScanner:411] zk_in_proc: false 
> I0929 22:36:10.805 [main, ArgScanner:411] zk_session_timeout: (30, mins) 
> I0929 22:36:10.805 [main, ArgScanner:411] zk_use_curator: true 
> {noformat}
> {noformat}
> I0929 22:48:37.678 [AsyncProcessor-3, TaskHistoryPruner:137] Pruning inactive 
> tasks 
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:37.738 [AsyncProcessor-5, TaskHistoryPruner:137] Pruning inactive 
> tasks 
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> 2016-09-29 
> 22:48:37,794:47040(0x7f07f4c3c940):ZOO_WARN@zookeeper_interest@1570: Exceeded 
> deadline by 12ms
> I0929 22:48:37.805 [AsyncProcessor-0, TaskHistoryPruner:137] Pruning inactive 
> tasks 
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:37.814 [AsyncProcessor-6, MemTaskStore:148] Query took 588 ms: 
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
> offset=0, limit=0} 
> I0929 22:48:37.867 [AsyncProcessor-1, TaskHistoryPruner:137] Pruning inactive 
> tasks 
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:37.873 [AsyncProcessor-2, MemTaskStore:148] Query took 304 ms: 
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
> offset=0, limit=0} 
> I0929 22:48:37.875 [AsyncProcessor-7, MemTaskStore:148] Query took 289 ms: 
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
> offset=0, limit=0} 
> I0929 22:48:37.886 [AsyncProcessor-4, TaskHistoryPruner:137] Pruning inactive 
> tasks 
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:38.045 [AsyncProcessor-3, MemTaskStore:148] Query took 359 ms: 
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
> offset=0, limit=0} 
> I0929 22:48:38.152 [AsyncProcessor-5, MemTaskStore:148] Query took 405 ms: 
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
> offset=0, limit=0} 
> I0929 22:48:38.407 [AsyncProcessor-0, MemTaskStore:148] Query took 594 ms: 
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
> offset=0, limit=0} 
> I0929 22:48:38.442 [AsyncProcessor-1, MemTaskStore:148] Query took 566 ms: 
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
> offset=0, limit=0} 
> I0929 22:48:38.445 [AsyncProcessor-4, MemTaskStore:148] Query took 550 ms: 
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
> offset=0, limit=0} 
> I0929 22:48:38.460 [AsyncProcessor-7, TaskHistoryPruner:137] Pruning inactive 
> tasks 
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:38.468 [AsyncProcessor-2, TaskHistoryPruner:137] Pruning inactive 
> tasks 
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> 2016-09-29 
> 22:48:51,141:47040(0x7f07f4c3c940):ZOO_WARN@zookeeper_interest@1570: Exceeded 
> deadline by 13ms
> I0929 22:49:01.002467 47173 process.cpp:3323] Handling HTTP event for process 
> 'metrics' with path: '/metrics/snapshot'
> I0929 22:48:38.483 [AsyncProcessor-6, TaskHistoryPruner:137] Pruning inactive 
> tasks 
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> W0929 22:49:07.165 [main-SendThread(smf1-alj-03-sr1.prod.twitter.com:2181), 
> ClientCnxn$SendThread:1108] Client session timed out, have not heard from 
> server in 36019ms for sessionid 0x576f9386901ce3 
> W0929 22:49:07.168 [qtp382517336-72, LeaderRedirect:194] No 
> serviceGroupMonitor in host set, will not redirect despite not being leader. 
> I0929 22:49:07.170 [qtp382517336-72, Slf4jRequestLog:60] 127.0.0.1 - - 
> [29/Sep/2016:22:49:07 +0000] "GET //localhost:8081/quotas HTTP/1.1" 503 1561  
> I0929 22:49:07.171 [AsyncProcessor-7, MemTaskStore:148] Query took 28701 ms: 
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
> offset=0, limit=0} 
> I0929 22:49:07.171 [AsyncProcessor-2, MemTaskStore:148] Query took 28693 ms: 
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
> offset=0, limit=0} 
> I0929 22:49:07.171 [qtp382517336-52, Slf4jRequestLog:60] 127.0.0.1 - - 
> [29/Sep/2016:22:49:07 +0000] "GET //localhost:8081/vars.json?filtered=1 
> HTTP/1.1" 200 34679  
> I0929 22:49:07.172 [main-SendThread(smf1-alj-03-sr1.prod.twitter.com:2181), 
> ClientCnxn$SendThread:1156] Client session timed out, have not heard from 
> server in 36019ms for sessionid 0x576f9386901ce3, closing socket connection 
> and attempting reconnect 
> I0929 22:49:07.179 [AsyncProcessor-0, TaskHistoryPruner:137] Pruning inactive 
> tasks 
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:49:07.179 [AsyncProcessor-5, TaskHistoryPruner:137] Pruning inactive 
> tasks 
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:49:07.273 [main-EventThread, ConnectionStateManager:228] State 
> change: SUSPENDED 
> E0929 22:49:07.345 [Curator-ConnectionStateManager-0, 
> SchedulerLifecycle$SchedulerCandidateImpl:395] Lost leadership, committing 
> suicide. 
> I0929 22:49:07.359 [Curator-ConnectionStateManager-0, 
> StateMachine$Builder:389] SchedulerLifecycle state machine transition 
> LEADER_AWAITING_REGISTRATION -> DEAD
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (AURORA-1786) -zk_session_timeout option does not work

Reply via email to