[
https://issues.apache.org/jira/browse/SOLR-11740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286088#comment-16286088
]
Cassandra Targett commented on SOLR-11740:
------------------------------------------
It's also possible to duplicate this with {{bin/solr start -c}} if you then
start another node on another port (7574) and create a collection with
{{bin/solr create}} (I did the create with {{-s 2 -rf 2}} options). It's always
the port 8983 instance that fails to stop & that's the one that launches ZK &
is the leader.
My suspicion is that it's somehow related to autoAddReplicas and autoscaling
features - during the stop shutdown, the shutdown of port 7574 registers as a
"nodeLost" event, which you can see in logs. I wonder if there is something
blocking the shutdown of port 8983? I can't really tell from the logs if it's
just registering the event or if it's actually trying to do something:
{code}
2017-12-11 15:53:24.737 INFO
(zkCallback-3-thread-7-processing-n:192.168.0.28:8983_solr) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged path:/collections/test/state.json] for
collection [test] has occurred - updating... (live nodes size: [1])
2017-12-11 15:53:24.745 INFO
(zkCallback-3-thread-5-processing-n:192.168.0.28:8983_solr) [c:test s:shard2
r:core_node8 x:test_shard2_replica_n6] o.a.s.c.ShardLeaderElectionContext I may
be the new leader - try and sync
2017-12-11 15:53:24.850 INFO
(zkCallback-3-thread-8-processing-n:192.168.0.28:8983_solr) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged path:/collections/test/state.json] for
collection [test] has occurred - updating... (live nodes size: [1])
2017-12-11 15:53:24.850 INFO
(zkCallback-3-thread-6-processing-n:192.168.0.28:8983_solr) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged path:/collections/test/state.json] for
collection [test] has occurred - updating... (live nodes size: [1])
2017-12-11 15:53:27.248 INFO
(zkCallback-3-thread-5-processing-n:192.168.0.28:8983_solr) [c:test s:shard2
r:core_node8 x:test_shard2_replica_n6] o.a.s.c.SyncStrategy Sync replicas to
http://192.168.0.28:8983/solr/test_shard2_replica_n6/
2017-12-11 15:53:27.249 INFO
(zkCallback-3-thread-5-processing-n:192.168.0.28:8983_solr) [c:test s:shard2
r:core_node8 x:test_shard2_replica_n6] o.a.s.c.SyncStrategy Sync Success - now
sync replicas to me
2017-12-11 15:53:27.249 INFO
(zkCallback-3-thread-5-processing-n:192.168.0.28:8983_solr) [c:test s:shard2
r:core_node8 x:test_shard2_replica_n6] o.a.s.c.SyncStrategy
http://192.168.0.28:8983/solr/test_shard2_replica_n6/ has no replicas
2017-12-11 15:53:27.254 INFO
(zkCallback-3-thread-5-processing-n:192.168.0.28:8983_solr) [c:test s:shard2
r:core_node8 x:test_shard2_replica_n6] o.a.s.c.ShardLeaderElectionContext I am
the new leader: http://192.168.0.28:8983/solr/test_shard2_replica_n6/ shard2
2017-12-11 15:53:27.255 INFO
(zkCallback-3-thread-5-processing-n:192.168.0.28:8983_solr) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged path:/collections/test/state.json] for
collection [test] has occurred - updating... (live nodes size: [1])
2017-12-11 15:53:27.255 INFO
(zkCallback-3-thread-6-processing-n:192.168.0.28:8983_solr) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged path:/collections/test/state.json] for
collection [test] has occurred - updating... (live nodes size: [1])
2017-12-11 15:53:55.582 INFO (ScheduledTrigger-6-thread-2) [ ]
o.a.s.c.a.SystemLogListener Collection .system does not exist, disabling
logging.
2017-12-11 15:53:55.601 INFO (qtp575335780-14) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/metrics
params={prefix=CORE.coreName&wt=javabin&version=2&group=solr.core} status=0
QTime=6
2017-12-11 15:53:55.606 INFO
(AutoscalingActionExecutor-7-thread-1-processing-n:192.168.0.28:8983_solr) [
] o.a.s.c.a.ExecutePlanAction No operations to execute for event: {
"id":"14ff48669eb06380Tbkths0f4h9k570n1nrsqrq81v",
"source":".auto_add_replicas",
"eventTime":1513007605406000000,
"eventType":"NODELOST",
"properties":{
"eventTimes":[1513007605406000000],
"_enqueue_time_":1513007635493000000,
"nodeNames":["192.168.0.28:7574_solr"]}}
{code}
I think it's just registering the event, but doesn't or can't actually do
anything since it's a single node (IOW, there isn't anywhere for it to do
anything in this scenario). I saw it eventually time out and forcefully kill
the process, but it seems Varun didn't see that (it was ~5 minutes before it
did that, I think).
Probably need [~shalinmangar] or [~caomanhdat] to take a look to see if my
hunch is correct.
If that's not it, SOLR-9137 made some change to the stop behavior and IMO would
be the 2nd place to look.
> bin/solr stop command always throws Connection refused
> ------------------------------------------------------
>
> Key: SOLR-11740
> URL: https://issues.apache.org/jira/browse/SOLR-11740
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Varun Thacker
> Priority: Blocker
> Fix For: 7.2, master (8.0)
>
>
> Start solr using {{./bin/solr start -e cloud -noprompt}} and then try
> stopping it. I ran into this problem every time I stopping solr on master.
> I'm using Java9 and it works fine on Solr 7.1 ( haven't checked on the 7_2
> branch yet )
> [master] ~/apache-work/lucene-solr/solr$ ./bin/solr stop -all
> Sending stop command to Solr running on port 7574 ... waiting up to 180
> seconds to allow Jetty process 40360 to stop gracefully.
> Sending stop command to Solr running on port 8983 ... waiting up to 180
> seconds to allow Jetty process 40263 to stop gracefully.
> java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at java.net.Socket.connect(Socket.java:538)
> at java.net.Socket.<init>(Socket.java:434)
> at java.net.Socket.<init>(Socket.java:244)
> at org.eclipse.jetty.start.Main.stop(Main.java:535)
> at org.eclipse.jetty.start.Main.stop(Main.java:511)
> at org.eclipse.jetty.start.Main.doStop(Main.java:499)
> at org.eclipse.jetty.start.Main.start(Main.java:404)
> at org.eclipse.jetty.start.Main.main(Main.java:76)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]