[jira] [Commented] (SOLR-10588) SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection failure caused by NullPointerException at SolrMetricManager.loadShardReporters

Mikhail Khludnev (JIRA) Sun, 30 Apr 2017 06:56:39 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15990248#comment-15990248
 ]


Mikhail Khludnev commented on SOLR-10588:
-----------------------------------------

what's really questionable is double core reload.

{code}
   [junit4]   2> 1613694 INFO  
(TEST-SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection-seed#[D64D4AF1644074DE])
 [    ] o.a.s.c.SolrCloudExampleTest Sending set-property 
'updateHandler.autoSoftCommit.maxTime'=3000 to SolrCLI.ConfigTool.
   [junit4]   1> 
   [junit4]   1> POSTing request to Config API: 
http://127.0.0.1:50975//gettingstarted/config
   [junit4]   1> 
{"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}}
   [junit4]   2> 1613699 INFO  (qtp1233100485-18318) [n:127.0.0.1:57043_ 
c:gettingstarted s:shard1 r:core_node3 x:gettingstarted_shard1_replica1] 
o.a.s.h.SolrConfigHandler Executed config commands successfully and persisted 
to ZK [{"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}}]
   [junit4]   2> 1613699 INFO  (qtp1233100485-18318) [n:127.0.0.1:57043_ 
c:gettingstarted s:shard1 r:core_node3 x:gettingstarted_shard1_replica1] 
o.a.s.h.SolrConfigHandler Waiting up to 30 secs for 4 replicas to set the 
property overlay to be of version 0 for collection gettingstarted
   [junit4]   2> 1613700 INFO  (Thread-5447) [n:127.0.0.1:56894_    ] 
o.a.s.c.SolrCore config update listener called for core 
gettingstarted_shard2_replica2
   [junit4]   2> 1613701 INFO  (Thread-5447) [n:127.0.0.1:56894_    ] 
o.a.s.c.SolrCore core reload gettingstarted_shard2_replica2
   [junit4]   2> 1613702 INFO  (Thread-5449) [n:127.0.0.1:57043_    ] 
o.a.s.c.SolrCore config update listener called for core 
gettingstarted_shard1_replica1
   [junit4]   2> 1613702 INFO  (Thread-5450) [n:127.0.0.1:49705_    ] 
o.a.s.c.SolrCore config update listener called for core 
gettingstarted_shard1_replica2
   [junit4]   2> 1613703 INFO  (Thread-5449) [n:127.0.0.1:57043_    ] 
o.a.s.c.SolrCore core reload gettingstarted_shard1_replica1
   [junit4]   2> 1613704 INFO  (Thread-5450) [n:127.0.0.1:49705_    ] 
o.a.s.c.SolrCore core reload gettingstarted_shard1_replica2
   [junit4]   2> 1613706 INFO  (Thread-5448) [n:127.0.0.1:32921_    ] 
o.a.s.c.SolrCore config update listener called for core 
gettingstarted_shard2_replica1
   [junit4]   2> 1613711 INFO  (Thread-5448) [n:127.0.0.1:32921_    ] 
o.a.s.c.SolrCore core reload gettingstarted_shard2_replica1
{code}
Config change is posted to Zk and it seems it triggers core reload by zk 
listener registered at {{SolrCore.registerConfListener()}}. I guess this 
because of {{Thread-54*}} names.
but then it happens again by {{SolrConfigHandler.Command.handleGET()}}
{code}
   [junit4]   2> 1613952 INFO  (SolrConfigHandler-refreshconf) 
[n:127.0.0.1:49705_ c:gettingstarted s:shard1 r:core_node4 
x:gettingstarted_shard1_replica2] o.a.s.c.SolrCore core reload 
gettingstarted_shard1_replica2
   [junit4]   2> 1613937 INFO  (SolrConfigHandler-refreshconf) 
[n:127.0.0.1:57043_ c:gettingstarted s:shard1 r:core_node3 
x:gettingstarted_shard1_replica1] o.a.s.c.SolrCore config update listener 
called for core gettingstarted_shard1_replica1
   [junit4]   2> 1613955 INFO  (SolrConfigHandler-refreshconf) 
[n:127.0.0.1:57043_ c:gettingstarted s:shard1 r:core_node3 
x:gettingstarted_shard1_replica1] o.a.s.c.SolrCore core reload 
gettingstarted_shard1_replica1
   [junit4]   2> 1613956 INFO  (SolrConfigHandler-refreshconf) 
[n:127.0.0.1:32921_ c:gettingstarted s:shard2 r:core_node2 
x:gettingstarted_shard2_replica1] o.a.s.c.SolrCore core reload 
gettingstarted_shard2_replica1
{code}
then script triggers deletes
{code}
   [junit4]   1> Deleting collection 'gettingstarted' using command:
   [junit4]   1> 
http://127.0.0.1:50975/admin/collections?action=DELETE&name=gettingstarted
{code}
and deleting is actually on going while {{SolrConfigHandler-refreshconf}} 
reloads config, and breaks with NPE and this might cause a leak. 
What we also can see from leakage dump, that leaked objects are created with 
core create command, but not made during reload. 

The question is: should it really reload core twice?  Can't 
{{SolrConfigHandler.Command.handleGET}} be synchronous? or pollable with async? 
cc [~noble.paul]

> SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection failure caused 
> by NullPointerException at SolrMetricManager.loadShardReporters
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10588
>                 URL: https://issues.apache.org/jira/browse/SOLR-10588
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Mikhail Khludnev
>         Attachments: consoleFull.html.zip
>
>
> https://builds.apache.org/job/Lucene-Solr-Tests-master/1788/testReport/junit/junit.framework/TestSuite/org_apache_solr_cloud_SolrCloudExampleTest/
> this NPE, even it might be quite reasonable itself, breaks core reload, and 
> applying config param. I'm -not- sure, -how- it -'s related- causes to these 
> constant failures.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-10588) SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection failure caused by NullPointerException at SolrMetricManager.loadShardReporters

Reply via email to