[ 
https://issues.apache.org/jira/browse/SOLR-11740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286088#comment-16286088
 ] 

Cassandra Targett commented on SOLR-11740:
------------------------------------------

It's also possible to duplicate this with {{bin/solr start -c}} if you then 
start another node on another port (7574) and create a collection with 
{{bin/solr create}} (I did the create with {{-s 2 -rf 2}} options). It's always 
the port 8983 instance that fails to stop & that's the one that launches ZK & 
is the leader.

My suspicion is that it's somehow related to autoAddReplicas and autoscaling 
features - during the stop shutdown, the shutdown of port 7574 registers as a 
"nodeLost" event, which you can see in logs. I wonder if there is something 
blocking the shutdown of port 8983? I can't really tell from the logs if it's 
just registering the event or if it's actually trying to do something:

{code}
2017-12-11 15:53:24.737 INFO  
(zkCallback-3-thread-7-processing-n:192.168.0.28:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/collections/test/state.json] for 
collection [test] has occurred - updating... (live nodes size: [1])
2017-12-11 15:53:24.745 INFO  
(zkCallback-3-thread-5-processing-n:192.168.0.28:8983_solr) [c:test s:shard2 
r:core_node8 x:test_shard2_replica_n6] o.a.s.c.ShardLeaderElectionContext I may 
be the new leader - try and sync
2017-12-11 15:53:24.850 INFO  
(zkCallback-3-thread-8-processing-n:192.168.0.28:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/collections/test/state.json] for 
collection [test] has occurred - updating... (live nodes size: [1])
2017-12-11 15:53:24.850 INFO  
(zkCallback-3-thread-6-processing-n:192.168.0.28:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/collections/test/state.json] for 
collection [test] has occurred - updating... (live nodes size: [1])
2017-12-11 15:53:27.248 INFO  
(zkCallback-3-thread-5-processing-n:192.168.0.28:8983_solr) [c:test s:shard2 
r:core_node8 x:test_shard2_replica_n6] o.a.s.c.SyncStrategy Sync replicas to 
http://192.168.0.28:8983/solr/test_shard2_replica_n6/
2017-12-11 15:53:27.249 INFO  
(zkCallback-3-thread-5-processing-n:192.168.0.28:8983_solr) [c:test s:shard2 
r:core_node8 x:test_shard2_replica_n6] o.a.s.c.SyncStrategy Sync Success - now 
sync replicas to me
2017-12-11 15:53:27.249 INFO  
(zkCallback-3-thread-5-processing-n:192.168.0.28:8983_solr) [c:test s:shard2 
r:core_node8 x:test_shard2_replica_n6] o.a.s.c.SyncStrategy 
http://192.168.0.28:8983/solr/test_shard2_replica_n6/ has no replicas
2017-12-11 15:53:27.254 INFO  
(zkCallback-3-thread-5-processing-n:192.168.0.28:8983_solr) [c:test s:shard2 
r:core_node8 x:test_shard2_replica_n6] o.a.s.c.ShardLeaderElectionContext I am 
the new leader: http://192.168.0.28:8983/solr/test_shard2_replica_n6/ shard2
2017-12-11 15:53:27.255 INFO  
(zkCallback-3-thread-5-processing-n:192.168.0.28:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/collections/test/state.json] for 
collection [test] has occurred - updating... (live nodes size: [1])
2017-12-11 15:53:27.255 INFO  
(zkCallback-3-thread-6-processing-n:192.168.0.28:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/collections/test/state.json] for 
collection [test] has occurred - updating... (live nodes size: [1])
2017-12-11 15:53:55.582 INFO  (ScheduledTrigger-6-thread-2) [   ] 
o.a.s.c.a.SystemLogListener Collection .system does not exist, disabling 
logging.
2017-12-11 15:53:55.601 INFO  (qtp575335780-14) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/metrics 
params={prefix=CORE.coreName&wt=javabin&version=2&group=solr.core} status=0 
QTime=6
2017-12-11 15:53:55.606 INFO  
(AutoscalingActionExecutor-7-thread-1-processing-n:192.168.0.28:8983_solr) [   
] o.a.s.c.a.ExecutePlanAction No operations to execute for event: {
  "id":"14ff48669eb06380Tbkths0f4h9k570n1nrsqrq81v",
  "source":".auto_add_replicas",
  "eventTime":1513007605406000000,
  "eventType":"NODELOST",
  "properties":{
    "eventTimes":[1513007605406000000],
    "_enqueue_time_":1513007635493000000,
    "nodeNames":["192.168.0.28:7574_solr"]}}
{code}

I think it's just registering the event, but doesn't or can't actually do 
anything since it's a single node (IOW, there isn't anywhere for it to do 
anything in this scenario). I saw it eventually time out and forcefully kill 
the process, but it seems Varun didn't see that (it was ~5 minutes before it 
did that, I think).

Probably need [~shalinmangar] or [~caomanhdat] to take a look to see if my 
hunch is correct.

If that's not it, SOLR-9137 made some change to the stop behavior and IMO would 
be the 2nd place to look.

> bin/solr stop command always throws Connection refused
> ------------------------------------------------------
>
>                 Key: SOLR-11740
>                 URL: https://issues.apache.org/jira/browse/SOLR-11740
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>            Priority: Blocker
>             Fix For: 7.2, master (8.0)
>
>
> Start solr using {{./bin/solr start -e cloud -noprompt}} and then try 
> stopping it. I ran into this problem every time I stopping solr on master. 
> I'm using Java9 and it works fine on Solr 7.1 ( haven't checked on the 7_2 
> branch yet )
> [master] ~/apache-work/lucene-solr/solr$ ./bin/solr  stop -all
> Sending stop command to Solr running on port 7574 ... waiting up to 180 
> seconds to allow Jetty process 40360 to stop gracefully.
> Sending stop command to Solr running on port 8983 ... waiting up to 180 
> seconds to allow Jetty process 40263 to stop gracefully.
> java.net.ConnectException: Connection refused (Connection refused)
>       at java.net.PlainSocketImpl.socketConnect(Native Method)
>       at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>       at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>       at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>       at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>       at java.net.Socket.connect(Socket.java:589)
>       at java.net.Socket.connect(Socket.java:538)
>       at java.net.Socket.<init>(Socket.java:434)
>       at java.net.Socket.<init>(Socket.java:244)
>       at org.eclipse.jetty.start.Main.stop(Main.java:535)
>       at org.eclipse.jetty.start.Main.stop(Main.java:511)
>       at org.eclipse.jetty.start.Main.doStop(Main.java:499)
>       at org.eclipse.jetty.start.Main.start(Main.java:404)
>       at org.eclipse.jetty.start.Main.main(Main.java:76)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to