[ 
https://issues.apache.org/jira/browse/SOLR-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798857#comment-16798857
 ] 

Cao Manh Dat edited comment on SOLR-13276 at 4/4/19 9:44 AM:
-------------------------------------------------------------

Thanks Hoss, I tried to reproduce the log but even on a Windows machine, it is 
hard to reproduce it.
It seems that even SolrCloudTest  do see the same failure, attached the log. So 
this seems that the failure does not introduced by changes made by this issue.

Through the attached log, I suspect the cause of problem is IndexFetcher is 
kicked off when CoreContainer is shutting down, so the core is not be able to 
released.
from: {{thetaphi_Lucene-Solr-8.x-Windows_69.log.txt}}

Two nodes are shutting down
{code}
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-3) [    ] 
o.a.s.c.CoreContainer Shutting down CoreContainer instance=1698101756
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-3) [    ] 
o.a.s.c.ZkController Remove node as live in 
ZooKeeper:/live_nodes/127.0.0.1:61571_solr
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-2) [    ] 
o.a.s.c.CoreContainer Shutting down CoreContainer instance=1055741610
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-2) [    ] 
o.a.s.c.ZkController Remove node as live in 
ZooKeeper:/live_nodes/127.0.0.1:61566_solr
{code}

After that, indexFetcher failed to close a core which lead to the leak error.
{code}
[junit4]   2> 151088 ERROR (indexFetcher-1096-thread-1) [    ] 
o.a.s.c.CachingDirectoryFactory Error closing 
directory:org.apache.solr.common.SolrException: Timeout waiting for all 
directory ref counts to be released - gave up waiting on 
CachedDir<<refCount=1;path=C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\solr\build\solr-solrj\test\J0\temp\solr.client.solrj.impl.CloudHttp2SolrClientTest_6DA5B1A938CC311D-001\tempDir-006\node1\.\replicaTypesTestColl_shard2_replica_p10\data\index;done=true>>
{code}
Therefore I think that SOLR-13339 may be able to solve this failure.


was (Author: caomanhdat):
Thanks Hoss, I tried to reproduce the log but even on a Windows machine, it is 
hard to reproduce it.
It seems that even SolrCloudTest  do see the same failure, attached the log. So 
this seems that the failure does not introduced by changes made by this issue.

Through the attached log, I suspect the cause of problem is IndexFetcher is 
kicked off when CoreContainer is shutting down, so the core is not be able to 
released.
from: {{thetaphi_Lucene-Solr-8.x-Windows_69.log.txt}}

Two nodes are shutting down
{code}
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-3) [    ] 
o.a.s.c.CoreContainer Shutting down CoreContainer instance=1698101756
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-3) [    ] 
o.a.s.c.ZkController Remove node as live in 
ZooKeeper:/live_nodes/127.0.0.1:61571_solr
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-2) [    ] 
o.a.s.c.CoreContainer Shutting down CoreContainer instance=1055741610
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-2) [    ] 
o.a.s.c.ZkController Remove node as live in 
ZooKeeper:/live_nodes/127.0.0.1:61566_solr
{code}

After that, indexFetcher failed to close a core which lead to the leak error.
{code}
[junit4]   2> 151088 ERROR (indexFetcher-1096-thread-1) [    ] 
o.a.s.c.CachingDirectoryFactory Error closing 
directory:org.apache.solr.common.SolrException: Timeout waiting for all 
directory ref counts to be released - gave up waiting on 
CachedDir<<refCount=1;path=C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\solr\build\solr-solrj\test\J0\temp\solr.client.solrj.impl.CloudHttp2SolrClientTest_6DA5B1A938CC311D-001\tempDir-006\node1\.\replicaTypesTestColl_shard2_replica_p10\data\index;done=true>>
{code}
Therefore I think that SOLR-13336 may be able to solve this failure.

> Adding Http2 equivalent classes of CloudSolrClient and 
> HttpClusterStateProvider 
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-13276
>                 URL: https://issues.apache.org/jira/browse/SOLR-13276
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>             Fix For: 8.1
>
>         Attachments: SOLR-13276.patch, SOLR-13276.patch, SOLR-13276.patch, 
> thetaphi-Lucene-Solr-master-Windows-7810.txt, 
> thetaphi_Lucene-Solr-8.x-Windows_69.log.txt, 
> thetaphi_Lucene-Solr-master-Windows_7754.log.txt
>
>
> Before we can move on and wipe out the usage of apache httpclient inside 
> Solr-core. We need to create Http/2 equivalent classes of CloudSolrClient and 
> HttpClusterStateProvider 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to