[
https://issues.apache.org/jira/browse/SOLR-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15990248#comment-15990248
]
Mikhail Khludnev commented on SOLR-10588:
-----------------------------------------
what's really questionable is double core reload.
{code}
[junit4] 2> 1613694 INFO
(TEST-SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection-seed#[D64D4AF1644074DE])
[ ] o.a.s.c.SolrCloudExampleTest Sending set-property
'updateHandler.autoSoftCommit.maxTime'=3000 to SolrCLI.ConfigTool.
[junit4] 1>
[junit4] 1> POSTing request to Config API:
http://127.0.0.1:50975//gettingstarted/config
[junit4] 1>
{"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}}
[junit4] 2> 1613699 INFO (qtp1233100485-18318) [n:127.0.0.1:57043_
c:gettingstarted s:shard1 r:core_node3 x:gettingstarted_shard1_replica1]
o.a.s.h.SolrConfigHandler Executed config commands successfully and persisted
to ZK [{"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}}]
[junit4] 2> 1613699 INFO (qtp1233100485-18318) [n:127.0.0.1:57043_
c:gettingstarted s:shard1 r:core_node3 x:gettingstarted_shard1_replica1]
o.a.s.h.SolrConfigHandler Waiting up to 30 secs for 4 replicas to set the
property overlay to be of version 0 for collection gettingstarted
[junit4] 2> 1613700 INFO (Thread-5447) [n:127.0.0.1:56894_ ]
o.a.s.c.SolrCore config update listener called for core
gettingstarted_shard2_replica2
[junit4] 2> 1613701 INFO (Thread-5447) [n:127.0.0.1:56894_ ]
o.a.s.c.SolrCore core reload gettingstarted_shard2_replica2
[junit4] 2> 1613702 INFO (Thread-5449) [n:127.0.0.1:57043_ ]
o.a.s.c.SolrCore config update listener called for core
gettingstarted_shard1_replica1
[junit4] 2> 1613702 INFO (Thread-5450) [n:127.0.0.1:49705_ ]
o.a.s.c.SolrCore config update listener called for core
gettingstarted_shard1_replica2
[junit4] 2> 1613703 INFO (Thread-5449) [n:127.0.0.1:57043_ ]
o.a.s.c.SolrCore core reload gettingstarted_shard1_replica1
[junit4] 2> 1613704 INFO (Thread-5450) [n:127.0.0.1:49705_ ]
o.a.s.c.SolrCore core reload gettingstarted_shard1_replica2
[junit4] 2> 1613706 INFO (Thread-5448) [n:127.0.0.1:32921_ ]
o.a.s.c.SolrCore config update listener called for core
gettingstarted_shard2_replica1
[junit4] 2> 1613711 INFO (Thread-5448) [n:127.0.0.1:32921_ ]
o.a.s.c.SolrCore core reload gettingstarted_shard2_replica1
{code}
Config change is posted to Zk and it seems it triggers core reload by zk
listener registered at {{SolrCore.registerConfListener()}}. I guess this
because of {{Thread-54*}} names.
but then it happens again by {{SolrConfigHandler.Command.handleGET()}}
{code}
[junit4] 2> 1613952 INFO (SolrConfigHandler-refreshconf)
[n:127.0.0.1:49705_ c:gettingstarted s:shard1 r:core_node4
x:gettingstarted_shard1_replica2] o.a.s.c.SolrCore core reload
gettingstarted_shard1_replica2
[junit4] 2> 1613937 INFO (SolrConfigHandler-refreshconf)
[n:127.0.0.1:57043_ c:gettingstarted s:shard1 r:core_node3
x:gettingstarted_shard1_replica1] o.a.s.c.SolrCore config update listener
called for core gettingstarted_shard1_replica1
[junit4] 2> 1613955 INFO (SolrConfigHandler-refreshconf)
[n:127.0.0.1:57043_ c:gettingstarted s:shard1 r:core_node3
x:gettingstarted_shard1_replica1] o.a.s.c.SolrCore core reload
gettingstarted_shard1_replica1
[junit4] 2> 1613956 INFO (SolrConfigHandler-refreshconf)
[n:127.0.0.1:32921_ c:gettingstarted s:shard2 r:core_node2
x:gettingstarted_shard2_replica1] o.a.s.c.SolrCore core reload
gettingstarted_shard2_replica1
{code}
then script triggers deletes
{code}
[junit4] 1> Deleting collection 'gettingstarted' using command:
[junit4] 1>
http://127.0.0.1:50975/admin/collections?action=DELETE&name=gettingstarted
{code}
and deleting is actually on going while {{SolrConfigHandler-refreshconf}}
reloads config, and breaks with NPE and this might cause a leak.
What we also can see from leakage dump, that leaked objects are created with
core create command, but not made during reload.
The question is: should it really reload core twice? Can't
{{SolrConfigHandler.Command.handleGET}} be synchronous? or pollable with async?
cc [~noble.paul]
> SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection failure caused
> by NullPointerException at SolrMetricManager.loadShardReporters
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-10588
> URL: https://issues.apache.org/jira/browse/SOLR-10588
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Mikhail Khludnev
> Attachments: consoleFull.html.zip
>
>
> https://builds.apache.org/job/Lucene-Solr-Tests-master/1788/testReport/junit/junit.framework/TestSuite/org_apache_solr_cloud_SolrCloudExampleTest/
> this NPE, even it might be quite reasonable itself, breaks core reload, and
> applying config param. I'm -not- sure, -how- it -'s related- causes to these
> constant failures.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]