I’m checking that I’m not in some old branch somehow … I’d have sweared
someone got rid of ZkCmdExecutor.
I can’t touch this overseer, I’m dying to see it go, so forgetting about
the fact that it’s insane that it goes to zk like this to deal with
leadership or that it’s half impervious to interrupts or any reasonable
shutdown behavior…
If someone gets an itch towards some more proper zk behavior, a decent
start is to kill these fall off retries.
Zk alerts us when it losses a connection via callback. When the connection
is back, another callback. An unlimited number of locations trying to work
this out on there own is terrible zk. In an ideal world, everything enters
a zk quiete mode and re-engaged when zk says hello again. A simpler shorter
term improvement is to simply sink all the zk calls when they hit the zk
connection manager and don’t let them go until the connection is restored.
1 thread leaked from SUITE scope at org.apache.solr.handler.
TestHdfsBackupRestoreCore:
1) Thread[id=1131, name=OverseerExitThread, state=TIMED_WAITING,
group=Overseer state updater.]
at [email protected]/java.lang.Thread.sleep(Native Method)
at app//org.apache.solr.common.cloud.ZkCmdExecutor.
retryDelay(ZkCmdExecutor.java:156)
at app//org.apache.solr.common.cloud.ZkCmdExecutor.
retryOperation(ZkCmdExecutor.java:89)
at app//org.apache.solr.common.cloud.SolrZkClient.getData(
SolrZkClient.java:343)
at app//org.apache.solr.cloud.Overseer$ClusterStateUpdater.
checkIfIamStillLeader(Overseer.java:412)
at app//org.apache.solr.cloud.Overseer$ClusterStateUpdater$$
Lambda$835/0x0000000100902440.run(Unknown Source)
at [email protected]/java.lang.Thread.run(Thread.java:829)
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from
SUITE scope at org.apache.solr.handler.TestHdfsBackupRestoreCore:
1) Thread[id=1131, name=OverseerExitThread, state=TIMED_WAITING,
group=Overseer state updater.]
at [email protected]/java.lang.Thread.sleep(Native Method)
at app//org.apache.solr.common.cloud.ZkCmdExecutor.
retryDelay(ZkCmdExecutor.java:156)
at app//org.apache.solr.common.cloud.ZkCmdExecutor.
retryOperation(ZkCmdExecutor.java:89)
at app//org.apache.solr.common.cloud.SolrZkClient.getData(
SolrZkClient.java:343)
at app//org.apache.solr.cloud.Overseer$ClusterStateUpdater.
checkIfIamStillLeader(Overseer.java:412)
at app//org.apache.solr.cloud.Overseer$ClusterStateUpdater$$
Lambda$835/0x0000000100902440.run(Unknown Source)
at [email protected]/java.lang.Thread.run(Thread.java:829)
at __randomizedtesting.SeedInfo.seed([8F6FE499FACF34E4]:0)
--
- Mark
http://about.me/markrmiller