[ 
https://issues.apache.org/jira/browse/SOLR-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839807#comment-16839807
 ] 

Hoss Man edited comment on SOLR-13470 at 5/14/19 9:25 PM:
----------------------------------------------------------

Here's an example of how things can go wrong with the attached patch that 
hardens up the asserts regarding the expected exception...

{noformat}
...
   [junit4]   2> 12780 WARN  (explicit-fetchindex-cmd) [    ] 
o.a.s.h.IndexFetcher File segments_2 did not match. expected checksum is 
1298216336 and actual is checksum 1205232915. expected length is 252 and actual 
length is 252
   [junit4]   2> 12780 INFO  (explicit-fetchindex-cmd) [    ] 
o.a.s.h.IndexFetcher This disk does not have enough space to download the index 
from leader/master. So cleaning up the local index.  This may lead to loss of 
data/or node if index replication fails in between
   [junit4]   2> 12785 INFO  (Thread-19) [    ] o.a.s.c.s.i.HttpSolrClient 
nocommit: remote ex (403) w/o error? ... 
{responseHeader={status=0,QTime=39},response={numFound=1200,start=0,docs=[]}}
   [junit4]   2> 12794 ERROR (Thread-19) [    ] 
o.a.s.h.TestReplicationHandlerDiskOverFlow Query Thread Failure
   [junit4]   2>           => 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:46668/solr/collection1: Forbidden
   [junit4]   2> 
   [junit4]   2> 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:46668/solr/collection1: Forbidden
   [junit4]   2> 
   [junit4]   2> request: 
http://127.0.0.1:46668/solr/collection1/select?q=*:*&rows=0&wt=javabin&version=2
   [junit4]   2>        at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:650)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:207) 
~[java/:?]
   [junit4]   2>        at 
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:987) ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1002) ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.handler.TestReplicationHandlerDiskOverFlow.lambda$testDiskOverFlow$1(TestReplicationHandlerDiskOverFlow.java:154)
 ~[test/:?]
   [junit4]   2>        at java.lang.Thread.run(Thread.java:834) [?:?]
...
   [junit4] HEARTBEAT J0 PID(26623@tray): 2019-05-14T14:01:33, stalled for 
69.1s at: TestReplicationHandlerDiskOverFlow.testDiskOverFlow
   [junit4]   2> 72012 INFO  (indexFetcher-27-thread-1) [    ] 
o.a.s.h.ReplicationHandler Poll disabled
   [junit4]   2> 72780 ERROR (explicit-fetchindex-cmd) [    ] 
o.a.s.h.TestReplicationHandlerDiskOverFlow IndexFetcher Thread Failure
   [junit4]   2>           => java.util.concurrent.TimeoutException
   [junit4]   2>        at 
java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:258)
   [junit4]   2> java.util.concurrent.TimeoutException: null
   [junit4]   2>        at 
java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:258) ~[?:?]
   [junit4]   2>        at 
java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:436) ~[?:?]
   [junit4]   2>        at 
org.apache.solr.handler.TestReplicationHandlerDiskOverFlow$2.getAsBoolean(TestReplicationHandlerDiskOverFlow.java:137)
 ~[test/:?]
   [junit4]   2>        at 
org.apache.solr.handler.IndexFetcher.deleteFilesInAdvance(IndexFetcher.java:1133)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:1033) 
~[java/:?]
   [junit4]   2>        at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:575) 
~[java/:?]
   [junit4]   2>        at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351) 
~[java/:?]
   [junit4]   2>        at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:422) 
~[java/:?]
   [junit4]   2>        at 
org.apache.solr.handler.ReplicationHandler.lambda$fetchIndex$0(ReplicationHandler.java:345)
 ~[java/:?]
   [junit4]   2>        at java.lang.Thread.run(Thread.java:834) [?:?]
...
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestReplicationHandlerDiskOverFlow -Dtests.method=testDiskOverFlow 
-Dtests.seed=7A25E0BCF2F3D8D4 -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=kok -Dtests.timezone=MET -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 70.4s | TestReplicationHandlerDiskOverFlow.testDiskOverFlow 
<<<
   [junit4]    > Throwable #1: java.lang.AssertionError: threads encountered 
failures (see logs for when) expected:<[]> but 
was:<[org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://127.0.0.1:46668/solr/collection1: Forbidden
   [junit4]    > request: 
http://127.0.0.1:46668/solr/collection1/select?q=*:*&rows=0&wt=javabin&version=2,
 java.util.concurrent.TimeoutException]>
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([7A25E0BCF2F3D8D4:B017476C918DD2BB]:0)
   [junit4]    >        at 
org.apache.solr.handler.TestReplicationHandlerDiskOverFlow.testDiskOverFlow(TestReplicationHandlerDiskOverFlow.java:189)
   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   [junit4]    >        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   [junit4]    >        at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
   [junit4]    >        at java.base/java.lang.Thread.run(Thread.java:834)
   [junit4]   2> NOTE: leaving temporary files on disk at: 
/home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.handler.TestReplicationHandlerDiskOverFlow_7A25E0BCF2F3D8D4-001
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene80): 
{name=BlockTreeOrds(blocksize=128), id=BlockTreeOrds(blocksize=128)}, 
docValues:{}, maxPointsInLeafNode=1340, maxMBSortInHeap=5.7037203451015515, 
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@57ee8eda),
 locale=kok, timezone=MET
   [junit4]   2> NOTE: Linux 3.19.0-84-generic amd64/Oracle Corporation 11.0.2 
(64-bit)/cpus=4,threads=2,free=205025280,total=260046848
   [junit4]   2> NOTE: All tests run in this JVM: 
[TestReplicationHandlerDiskOverFlow]
   [junit4] Completed [1/1 (1!)] in 72.25s, 1 test, 1 failure <<< FAILURES!
{noformat}

...note that the added logging in HttpSolrClient shows that we're evidently 
getting a valid search result w/a 403 response code (did the query somehow 
already run successfully before the server realized it should return 403?)

But re-running this same seed multiple times will frequently pass (ie: the 
exception message makes it all the way to the client)

(Jira comment edited to fix mangled whitespace)
 


was (Author: hossman):

Here's an example of how things can go wrong with the attached patch that 
hardens up the asserts regarding the expected exception...

{noformat}
...
[junit4] 2> 12780 WARN (explicit-fetchindex-cmd) [ ] o.a.s.h.IndexFetcher File 
segments_2 did not match. expected checksum is 1298216336 and actual is 
checksum 1205232915. expected length is 252 and actual length is 252
[junit4] 2> 12780 INFO (explicit-fetchindex-cmd) [ ] o.a.s.h.IndexFetcher This 
disk does not have enough space to download the index from leader/master. So 
cleaning up the local index. This may lead to loss of data/or node if index 
replication fails in between
[junit4] 2> 12785 INFO (Thread-19) [ ] o.a.s.c.s.i.HttpSolrClient nocommit: 
remote ex (403) w/o error? ... 
{responseHeader={status=0,QTime=39},response={numFound=1200,start=0,docs=[]}}
[junit4] 2> 12794 ERROR (Thread-19) [ ] 
o.a.s.h.TestReplicationHandlerDiskOverFlow Query Thread Failure
[junit4] 2> => 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:46668/solr/collection1: Forbidden
[junit4] 2>
[junit4] 2> 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:46668/solr/collection1: Forbidden
[junit4] 2>
[junit4] 2> request: 
http://127.0.0.1:46668/solr/collection1/select?q=*:*&rows=0&wt=javabin&version=2
[junit4] 2> at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:650)
 ~[java/:?]
[junit4] 2> at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
 ~[java/:?]
[junit4] 2> at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
 ~[java/:?]
[junit4] 2> at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:207) 
~[java/:?]
[junit4] 2> at 
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:987) ~[java/:?]
[junit4] 2> at 
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1002) ~[java/:?]
[junit4] 2> at 
org.apache.solr.handler.TestReplicationHandlerDiskOverFlow.lambda$testDiskOverFlow$1(TestReplicationHandlerDiskOverFlow.java:154)
 ~[test/:?]
[junit4] 2> at java.lang.Thread.run(Thread.java:834) [?:?]
...
[junit4] HEARTBEAT J0 PID(26623@tray): 2019-05-14T14:01:33, stalled for 69.1s 
at: TestReplicationHandlerDiskOverFlow.testDiskOverFlow
[junit4] 2> 72012 INFO (indexFetcher-27-thread-1) [ ] 
o.a.s.h.ReplicationHandler Poll disabled
[junit4] 2> 72780 ERROR (explicit-fetchindex-cmd) [ ] 
o.a.s.h.TestReplicationHandlerDiskOverFlow IndexFetcher Thread Failure
[junit4] 2> => java.util.concurrent.TimeoutException
[junit4] 2> at 
java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:258)
[junit4] 2> java.util.concurrent.TimeoutException: null
[junit4] 2> at 
java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:258) ~[?:?]
[junit4] 2> at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:436) 
~[?:?]
[junit4] 2> at 
org.apache.solr.handler.TestReplicationHandlerDiskOverFlow$2.getAsBoolean(TestReplicationHandlerDiskOverFlow.java:137)
 ~[test/:?]
[junit4] 2> at 
org.apache.solr.handler.IndexFetcher.deleteFilesInAdvance(IndexFetcher.java:1133)
 ~[java/:?]
[junit4] 2> at 
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:1033) 
~[java/:?]
[junit4] 2> at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:575) 
~[java/:?]
[junit4] 2> at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351) 
~[java/:?]
[junit4] 2> at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:422) 
~[java/:?]
[junit4] 2> at 
org.apache.solr.handler.ReplicationHandler.lambda$fetchIndex$0(ReplicationHandler.java:345)
 ~[java/:?]
[junit4] 2> at java.lang.Thread.run(Thread.java:834) [?:?]
...
[junit4] 2> NOTE: reproduce with: ant test 
-Dtestcase=TestReplicationHandlerDiskOverFlow -Dtests.method=testDiskOverFlow 
-Dtests.seed=7A25E0BCF2F3D8D4 -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=kok -Dtests.timezone=MET -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
[junit4] FAILURE 70.4s | TestReplicationHandlerDiskOverFlow.testDiskOverFlow <<<
[junit4] > Throwable #1: java.lang.AssertionError: threads encountered failures 
(see logs for when) expected:<[]> but 
was:<[org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://127.0.0.1:46668/solr/collection1: Forbidden
[junit4] > request: 
http://127.0.0.1:46668/solr/collection1/select?q=*:*&rows=0&wt=javabin&version=2,
 java.util.concurrent.TimeoutException]>
[junit4] > at 
__randomizedtesting.SeedInfo.seed([7A25E0BCF2F3D8D4:B017476C918DD2BB]:0)
[junit4] > at 
org.apache.solr.handler.TestReplicationHandlerDiskOverFlow.testDiskOverFlow(TestReplicationHandlerDiskOverFlow.java:189)
[junit4] > at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit4] > at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[junit4] > at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[junit4] > at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[junit4] > at java.base/java.lang.Thread.run(Thread.java:834)
[junit4] 2> NOTE: leaving temporary files on disk at: 
/home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.handler.TestReplicationHandlerDiskOverFlow_7A25E0BCF2F3D8D4-001
[junit4] 2> NOTE: test params are: codec=Asserting(Lucene80): 
{name=BlockTreeOrds(blocksize=128), id=BlockTreeOrds(blocksize=128)}, 
docValues:{}, maxPointsInLeafNode=1340, maxMBSortInHeap=5.7037203451015515, 
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@57ee8eda),
 locale=kok, timezone=MET
[junit4] 2> NOTE: Linux 3.19.0-84-generic amd64/Oracle Corporation 11.0.2 
(64-bit)/cpus=4,threads=2,free=205025280,total=260046848
[junit4] 2> NOTE: All tests run in this JVM: 
[TestReplicationHandlerDiskOverFlow]
[junit4] Completed [1/1 (1!)] in 72.25s, 1 test, 1 failure <<< FAILURES!

{noformat}

...note that the added logging in HttpSolrClient shows that we're evidently 
getting a valid search result w/a 403 response code (did the query somehow 
already run successfully before the server realized it should return 403?)

But re-running this same seed multiple times will frequently pass (ie: the 
exception message makes it all the way to the client)

 

> SolrException msg not always propogated to HttpClient (may be specific to 
> SOLR-12999 ?)
> ---------------------------------------------------------------------------------------
>
>                 Key: SOLR-13470
>                 URL: https://issues.apache.org/jira/browse/SOLR-13470
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>         Attachments: SOLR-13470.patch
>
>
> While working on some test hardening for SOLR-12999, I discovered a strange 
> bug related to how SolrExceptions are propogated to HttpClients -- sometimes 
> the message set by the server side code when throwing the SolrException is 
> set in the remote exception recieved by the HttpSolrClient, other times it is 
> not.
> it's not clear to me if this is specific to the IndexFetcher related code 
> (added in SOLR-12999) that throw SolrExceptions when the index is in the 
> middle of a full copy, or if it's a general problem that can happen with any 
> SolrException->HTTP->RemoteSolrException via HttpSolrClient that only happens 
> to manifests because of some quirk in the threading of 
> TestReplicationHandlerDiskOverFlow. 
> (perhaps because we don't have a lot of HTTP level tests checking the 
> exception message?)
> At the moment, TestReplicationHandlerDiskOverFlow works around this issue by 
> only comparing the HTTP Staus code to ensure it's what's expected, w/o 
> checking the getMessage() ... I'll attach a patch that demonstrates how 
> including a getMessage() assertion can (sporadically) fail, and includes some 
> nocommit debugging code i added to HttpSolrClient to try and make sense of 
> what's happening...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to