[
https://issues.apache.org/jira/browse/AURORA-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Robinson closed AURORA-1786.
----------------------------------
Resolution: Not A Problem
Actually, looks like there's a maximum the server will accept.
{quote}
One of the parameters to the ZooKeeper client library call to create a
ZooKeeper session is the session timeout in milliseconds. The client sends a
requested timeout, the server responds with the timeout that it can give the
client. The current implementation requires that the timeout be a minimum of 2
times the tickTime (as set in the server configuration) and a maximum of 20
times the tickTime. The ZooKeeper client API allows access to the negotiated
timeout.
{quote}
https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkSessions
> -zk_session_timeout option does not work
> ----------------------------------------
>
> Key: AURORA-1786
> URL: https://issues.apache.org/jira/browse/AURORA-1786
> Project: Aurora
> Issue Type: Bug
> Reporter: David Robinson
>
> Looks like the -zk_session_timeout option has no affect. I've set
> -zk_session_timeout="60mins" to attempt to work around ZK session timeouts
> (due to GC pauses caused by TaskHistoryPruner pruning a huge number of
> inactive tasks), but the default 30 seconds seems to always be used.
> {noformat}
> I0929 22:36:10.804 [main, ArgScanner:411] zk_chroot_path: null
> I0929 22:36:10.804 [main, ArgScanner:411] zk_digest_credentials: xxxx:xxxx
> I0929 22:36:10.805 [main, ArgScanner:411] zk_endpoints: [zk.example.com:2181]
> I0929 22:36:10.805 [main, ArgScanner:411] zk_in_proc: false
> I0929 22:36:10.805 [main, ArgScanner:411] zk_session_timeout: (30, mins)
> I0929 22:36:10.805 [main, ArgScanner:411] zk_use_curator: true
> {noformat}
> {noformat}
> I0929 22:48:37.678 [AsyncProcessor-3, TaskHistoryPruner:137] Pruning inactive
> tasks
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3,
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:37.738 [AsyncProcessor-5, TaskHistoryPruner:137] Pruning inactive
> tasks
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3,
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> 2016-09-29
> 22:48:37,794:47040(0x7f07f4c3c940):ZOO_WARN@zookeeper_interest@1570: Exceeded
> deadline by 12ms
> I0929 22:48:37.805 [AsyncProcessor-0, TaskHistoryPruner:137] Pruning inactive
> tasks
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3,
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:37.814 [AsyncProcessor-6, MemTaskStore:148] Query took 588 ms:
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[],
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[],
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}],
> offset=0, limit=0}
> I0929 22:48:37.867 [AsyncProcessor-1, TaskHistoryPruner:137] Pruning inactive
> tasks
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3,
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:37.873 [AsyncProcessor-2, MemTaskStore:148] Query took 304 ms:
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[],
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[],
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}],
> offset=0, limit=0}
> I0929 22:48:37.875 [AsyncProcessor-7, MemTaskStore:148] Query took 289 ms:
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[],
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[],
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}],
> offset=0, limit=0}
> I0929 22:48:37.886 [AsyncProcessor-4, TaskHistoryPruner:137] Pruning inactive
> tasks
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3,
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:38.045 [AsyncProcessor-3, MemTaskStore:148] Query took 359 ms:
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[],
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[],
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}],
> offset=0, limit=0}
> I0929 22:48:38.152 [AsyncProcessor-5, MemTaskStore:148] Query took 405 ms:
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[],
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[],
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}],
> offset=0, limit=0}
> I0929 22:48:38.407 [AsyncProcessor-0, MemTaskStore:148] Query took 594 ms:
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[],
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[],
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}],
> offset=0, limit=0}
> I0929 22:48:38.442 [AsyncProcessor-1, MemTaskStore:148] Query took 566 ms:
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[],
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[],
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}],
> offset=0, limit=0}
> I0929 22:48:38.445 [AsyncProcessor-4, MemTaskStore:148] Query took 550 ms:
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[],
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[],
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}],
> offset=0, limit=0}
> I0929 22:48:38.460 [AsyncProcessor-7, TaskHistoryPruner:137] Pruning inactive
> tasks
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3,
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:38.468 [AsyncProcessor-2, TaskHistoryPruner:137] Pruning inactive
> tasks
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3,
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> 2016-09-29
> 22:48:51,141:47040(0x7f07f4c3c940):ZOO_WARN@zookeeper_interest@1570: Exceeded
> deadline by 13ms
> I0929 22:49:01.002467 47173 process.cpp:3323] Handling HTTP event for process
> 'metrics' with path: '/metrics/snapshot'
> I0929 22:48:38.483 [AsyncProcessor-6, TaskHistoryPruner:137] Pruning inactive
> tasks
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3,
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> W0929 22:49:07.165 [main-SendThread(smf1-alj-03-sr1.prod.twitter.com:2181),
> ClientCnxn$SendThread:1108] Client session timed out, have not heard from
> server in 36019ms for sessionid 0x576f9386901ce3
> W0929 22:49:07.168 [qtp382517336-72, LeaderRedirect:194] No
> serviceGroupMonitor in host set, will not redirect despite not being leader.
> I0929 22:49:07.170 [qtp382517336-72, Slf4jRequestLog:60] 127.0.0.1 - -
> [29/Sep/2016:22:49:07 +0000] "GET //localhost:8081/quotas HTTP/1.1" 503 1561
> I0929 22:49:07.171 [AsyncProcessor-7, MemTaskStore:148] Query took 28701 ms:
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[],
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[],
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}],
> offset=0, limit=0}
> I0929 22:49:07.171 [AsyncProcessor-2, MemTaskStore:148] Query took 28693 ms:
> ITaskQuery{role=null, environment=null, jobName=null, taskIds=[],
> statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[],
> jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}],
> offset=0, limit=0}
> I0929 22:49:07.171 [qtp382517336-52, Slf4jRequestLog:60] 127.0.0.1 - -
> [29/Sep/2016:22:49:07 +0000] "GET //localhost:8081/vars.json?filtered=1
> HTTP/1.1" 200 34679
> I0929 22:49:07.172 [main-SendThread(smf1-alj-03-sr1.prod.twitter.com:2181),
> ClientCnxn$SendThread:1156] Client session timed out, have not heard from
> server in 36019ms for sessionid 0x576f9386901ce3, closing socket connection
> and attempting reconnect
> I0929 22:49:07.179 [AsyncProcessor-0, TaskHistoryPruner:137] Pruning inactive
> tasks
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3,
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:49:07.179 [AsyncProcessor-5, TaskHistoryPruner:137] Pruning inactive
> tasks
> [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
> mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3,
> mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
> mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:49:07.273 [main-EventThread, ConnectionStateManager:228] State
> change: SUSPENDED
> E0929 22:49:07.345 [Curator-ConnectionStateManager-0,
> SchedulerLifecycle$SchedulerCandidateImpl:395] Lost leadership, committing
> suicide.
> I0929 22:49:07.359 [Curator-ConnectionStateManager-0,
> StateMachine$Builder:389] SchedulerLifecycle state machine transition
> LEADER_AWAITING_REGISTRATION -> DEAD
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)