[
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123736#comment-15123736
]
Stephan Lagraulet commented on SOLR-3274:
-----------------------------------------
I think that this message could be misleading according to what is happening to
our servers:
First we receive a disconnect event.
{code}
2016-01-29 14:52:38.868 INFO
(zkCallback-3-thread-73-processing-n:solrnode027:8983_solr-EventThread) [ ]
o.a.s.c.c.ConnectionManager Watcher
org.apache.solr.common.cloud.ConnectionManager@478ca953
name:ZooKeeperConnection
Watcher:solrnode013:2181,solrnode014:2181,solrnode015:2181 got event
WatchedEvent state:Disconnected type:None path:null path:null type:None
2016-01-29 14:52:38.868 INFO
(zkCallback-3-thread-73-processing-n:solrnode027:8983_solr-EventThread) [ ]
o.a.s.c.c.ConnectionManager zkClient has disconnected
{code}
Then we receive an "Expired" event:
{code}
2016-01-29 14:52:40.028 INFO
(zkCallback-3-thread-73-processing-n:solrnode027:8983_solr-EventThread) [ ]
o.a.s.c.c.ConnectionManager Watcher
org.apache.solr.common.cloud.ConnectionManager@478ca953
name:ZooKeeperConnection
Watcher:solrnode013:2181,solrnode014:2181,solrnode015:2181 got event
WatchedEvent state:Expired type:None path:null path:null type:None
2016-01-29 14:52:40.028 INFO
(zkCallback-3-thread-73-processing-n:solrnode027:8983_solr-EventThread) [ ]
o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired.
Attempting to reconnect to recover relationship with ZooKeeper...
2016-01-29 14:52:40.029 INFO
(zkCallback-3-thread-73-processing-n:solrnode027:8983_solr-EventThread) [ ]
o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting a new one...
{code}
Then the connexion is established:
{code}
2016-01-29 14:52:40.393 INFO
(zkCallback-3-thread-73-processing-n:a01solrf027.cdweb.biz:8983_solr-EventThread)
[ ] o.a.s.c.c.ConnectionManager Watcher
org.apache.solr.common.cloud.ConnectionManager@478ca953
name:ZooKeeperConnection
Watcher:a01solrf013.cdweb.biz:2181,a01solrf014.cdweb.biz:2181,a01solrf015.cdweb.biz:2181
got event WatchedEvent state:SyncConnected type:None path:null path:null
type:None
2016-01-29 14:52:40.393 INFO
(zkCallback-3-thread-73-processing-n:a01solrf027.cdweb.biz:8983_solr-EventThread)
[ ] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
2016-01-29 14:52:40.393 INFO
(zkCallback-3-thread-73-processing-n:a01solrf027.cdweb.biz:8983_solr-EventThread)
[ ] o.a.s.c.c.ConnectionManager Connection with ZooKeeper reestablished.
2016-01-29 14:52:40.393 INFO
(zkCallback-3-thread-73-processing-n:a01solrf027.cdweb.biz:8983_solr-EventThread)
[ ] o.a.s.c.ZkController ZooKeeper session re-connected ... refreshing core
states after session expiration.
2016-01-29 14:52:40.034 ERROR (qtp1057941451-80095) [c:offers_full s:shard3
r:core_node7 x:offers_full_shard3_replica2] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are
disabled.
at
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1459)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:660)
at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:260)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:145)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
> ZooKeeper related SolrCloud problems
> ------------------------------------
>
> Key: SOLR-3274
> URL: https://issues.apache.org/jira/browse/SOLR-3274
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 4.0-ALPHA
> Environment: Any
> Reporter: Per Steffensen
>
> Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7
> Solr servers, running 28 slices of the same collection (collA) - all slices
> have one replica (two shards all in all - leader + replica) - 56 cores all in
> all (8 shards on each solr instance). But anyways...
> Besides the problem reported in SOLR-3273, the system seems to run fine under
> high load for several hours, but eventually errors like the ones shown below
> start to occur. I might be wrong, but they all seem to indicate some kind of
> unstability in the collaboration between Solr and ZooKeeper. I have to say
> that I havnt been there to check ZooKeeper "at the moment where those
> exception occur", but basically I dont believe the exceptions occur because
> ZooKeeper is not running stable - at least when I go and check ZooKeeper
> through other "channels" (e.g. my eclipse ZK plugin) it is always accepting
> my connection and generally seems to be doing fine.
> Exception 1) Often the first error we see in solr.log is something like this
> {code}
> Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper -
> Updates are disabled.
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
> at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
> I believe this error basically occurs because SolrZkClient.isConnected
> reports false, which means that its internal "keeper.getState" does not
> return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED
> for a long time, since this error starts occuring after several hours of
> processing without this problem showing. But why is it suddenly not connected
> anymore?!
> Exception 2) We also see errors like the following, and if Im not mistaken,
> they start occuring shortly after "Exception 1)" (above) shows for the fist
> time
> {code}
> Mar 22, 2012 5:07:26 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> Please note that the exception says "no servers hosting shard: <blank>".
> Looking at the code a "shard"-string was actually supposed to be written at
> <blank>. Basically this means that HttpShardHandler.submit was called with
> an empty "shard"-string parameter. But who does this?
> CoreAdminHandler.handleDistribUrlAction or SearchHandler.handleRequestBody or
> SyncStrategy or PeerSync or... I dont know, and maybe it is not that
> relevant, because I guess they all get the "shard"-string from ZooKeeper.
> Again something pointing in the direction of unstable collaboration between
> Solr and ZooKeeper.
> Exception 3) We also see exceptions like this
> {code}
> Mar 25, 2012 3:05:38 PM org.apache.solr.common.cloud.ZkStateReader$3 process
> WARNING: ZooKeeper watch triggered, but Solr cannot talk to ZK
> Mar 25, 2012 3:05:38 PM org.apache.solr.cloud.LeaderElector$1 process
> WARNING:
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode
> = Session expired for /collections/collA/leader_elect/slice26/election
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
> at
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:266)
> at
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:263)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
> at
> org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:263)
> at
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:92)
> at
> org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57)
> at
> org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:121)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
> {code}
> Maybe this will we usable for some bug-fixing or for making the code more
> stable. I know 4.0 is not stable/released yet, and that we therefore should
> expect this kind of errors at the moment. So this is not negative criticism -
> just reporting of issues observed when using SolrCloud features under high
> load for several days. Any feedback is more than welcome.
> Regards, Per Steffensen
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]