[
https://issues.apache.org/jira/browse/SOLR-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769354#comment-17769354
]
Chris M. Hostetter commented on SOLR-16992:
-------------------------------------------
{quote}...looked at all uses again and the large majority filter nulls, one
would NPE and the shards ones have no nulls.
{quote}
That's fine ... if the only code paths you've changed that _don't_ previously
filter nulls are impossible to produce nulls (or would NPE if a null was
produced) that i'm fine with the idea of refactoring the null-filtering check
into {{StreamExecutorHelper.submitAllAndAwaitAggregatingExceptions}} if you
prefer that – as long as it's mentioned in the javadocs
(but i'm -1 to th null filtering being in
{{{}ExecutorUtil.submitAllAndAwaitAggregatingExceptions{}}})
Only a few questions/concerns about your current diff...
* I'm wondering if
{{StreamExecutorHelper.submitAllAndAwaitAggregatingExceptions}} and
{{ExecutorUtil.submitAllAndAwaitAggregatingExceptions}} should return
{{Collection<T>}} instead of {{List<T>}} to help discourage people from making
assumptions about the order of the results?
** Either that, or the javadocs should be explicit that the results are
returned in the same order as the tasks
*** Except if you do refactor the null filtering into
{{StreamExecutorHelper.submitAllAndAwaitAggregatingExceptions}} it definitely
shouldn't say that, and should return {{Collection}} instead of {{List}}
* In {{{}ExecutorUtil.submitAllAndAwaitAggregatingExceptions{}}}...
** {{List<T> results = new ArrayList<>()}} can be init'ed using
{{tasks.size()}} as a micro optimization to eliminate the possibility that the
{{ArrayList}} will ever need to resize itself.
** let's add a comment to the
{{tasks.stream().map(service::submit).collect(Collectors.toUnmodifiableList())}}
call ...
*** {{// Could alternatively use service.invokeAll, but this way we can start
looping over futures before all are done}}
* StreamExecutorHelperTest should extend SolrTestCase
* Generally speaking, we avoid this pattern...
**
{code:java}
try {
codeThatThrowsException()
fail("Expected exception");
} catch (TypeOfExpectedExeption e) {
assertThat(somethingAbout(e))
}
{code}
** It tends to lead to developer confusion/mistakes when modifying tests
** instead the lucene & solr code base convention is...
{code:java}
TypeOfExpectedExeption e = expetThrows(TypeOfExpectedExeption.class, "Expected
exception", () -> codeThatThrowsException());
assertThat(somethingAbout(e))
{code}
> Non-reproducible StreamingTest failures -- suggests CloudSolrStream
> concurency race condition
> ---------------------------------------------------------------------------------------------
>
> Key: SOLR-16992
> URL: https://issues.apache.org/jira/browse/SOLR-16992
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Chris M. Hostetter
> Assignee: Alex Deparvu
> Priority: Major
> Attachments:
> OUTPUT-org.apache.solr.client.solrj.io.stream.StreamingTest.txt,
> thetaphi_solr_Solr-main-Linux_14679.log.txt
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Roughly 3% of all jenkins jobs that run {{StreamingTest}} wind up having
> suite level failures.
> These failures have historically taken the form of
> {{com.carrotsearch.randomizedtesting.ThreadLeakError}} and the leaked threads
> all have names like
> {{"h2sc-718-thread-2"}} indicating that they come from the internal
> {{ExecutorService}} of an {{{}Http2SolrClient{}}}.
> In my experience, the seeds from these failures have never reproduced -
> suggesting that the problem is related to concurrency.
> SOLR-16983 restored the (correct) use of {{ObjectReleaseTracker}} which in
> theory should help pinpoint where {{Http2SolrClient}} instances might not be
> getting closed (by causing {{ObjectReleaseTracker}} to fail with stacktraces
> of when/where any unclosed instances were created - ie: which test method)
> In practice, I have managed to force one failure from {{StreamingTest}} since
> the SOLR-16983 changes (logs to be attached soon) - but it still didn't
> indicate any leaked/unclosed {{Http2SolrClient}} instances. What it instead
> indicated was a _single_ unclosed {{InputStream}} instance related to
> {{Http2SolrClient}} connections (SOLR-16983 also added better tracking of
> this) coming from {{StreamingTest.testExceptionStream}} - a test method that
> opens _five_ very similar {{ExceptionStream}} instances, wrapping
> {{CloudSolrStream}} instance, which expect to trigger server side errors.
> By it's very design, {{ExceptionStream}} catches & records any exceptions
> from the stream it wraps, so even in the event of these "expected" server
> side errors, {{ExceptionStream.close()}} should still be correctly getting
> called (and propagating down to the {{CloudStream}} it wraps).
> I believe the underlying problem has to do with a concurrency race condition
> between the call to {{CloudStream.close()}} and the {{ExecutorService}} used
> internally by {{CloudSolrStream.openStreams()}} (details to follow)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]