Chris M. Hostetter created SOLR-16992:
-----------------------------------------
Summary: Non-reproducible StreamingTest failures -- suggests
CloudSolrStream concurency race condition
Key: SOLR-16992
URL: https://issues.apache.org/jira/browse/SOLR-16992
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter
Roughly 3% of all jenkins jobs that run {{StreamingTest}} wind up having suite
level failures.
These failures have historically taken the form of
{{com.carrotsearch.randomizedtesting.ThreadLeakError}} and the leaked threads
all have names like
{{"h2sc-718-thread-2"}} indicating that they come from the internal
{{ExecutorService}} of an {{{}Http2SolrClient{}}}.
In my experience, the seeds from these failures have never reproduced -
suggesting that the problem is related to concurrency.
SOLR-16983 restored the (correct) use of {{ObjectReleaseTracker}} which in
theory should help pinpoint where {{Http2SolrClient}} instances might not be
getting closed (by causing {{ObjectReleaseTracker}} to fail with stacktraces of
when/where any unclosed instances were created - ie: which test method)
In practice, I have managed to force one failure from {{StreamingTest}} since
the SOLR-16983 changes (logs to be attached soon) - but it still didn't
indicate any leaked/unclosed {{Http2SolrClient}} instances. What it instead
indicated was a _single_ unclosed {{InputStream}} instance related to
{{Http2SolrClient}} connections (SOLR-16983 also added better tracking of this)
coming from {{StreamingTest.testExceptionStream}} - a test method that opens
_five_ very similar {{ExceptionStream}} instances, wrapping {{CloudSolrStream}}
instance, which expect to trigger server side errors.
By it's very design, {{ExceptionStream}} catches & records any exceptions from
the stream it wraps, so even in the event of these "expected" server side
errors, {{ExceptionStream.close()}} should still be correctly getting called
(and propagating down to the {{CloudStream}} it wraps).
I believe the underlying problem has to do with a concurrency race condition
between the call to {{CloudStream.close()}} and the {{ExecutorService}} used
internally by {{CloudSolrStream.openStreams()}} (details to follow)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]