[
https://issues.apache.org/jira/browse/SOLR-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768964#comment-17768964
]
Alex Deparvu edited comment on SOLR-16992 at 9/26/23 2:23 AM:
--------------------------------------------------------------
{quote}
I like the impl of {{SolrClientCache.close()}} a lot ... strictly speaking i
don't think {{AtomicBoolean}} is actually necessary given that (all?) the
methods are synchronized, but i don't care. i prefer your AtomicBoolean
approach over depending on the synchronization (maybe/hopefully enough of the
methods can similarly be improved to the point we can remove the
synchronization some day)
{quote}
you are correct the atomic boolean is not strictly needed.
{quote}
checkState(); is too vague of a method name to be readable in context w/o
consulting javados. The convention in lucene is {{ensureOpen(); }} which i
think is really self documenting as you skim code
{quote}
yeah, will move to ensureOpen, sounds better.
{quote}
I think having a
StreamExecutorHelper.submitAllAndAwaitAggregatingExceptions(...) method that is
re-used in multiple streams is a fine idea, but I think the "guts" of it should
be re-implemented as an ExecutorUtil class that takes an existing
ExecutorService as an argument (and does not close it even on exception)
{quote}
I agree. will move it there
{quote}
but the if (result != null) ... logic should not be part of a general
ExecutorUtil helper
arguably it shouldn't be part of
StreamExecutorHelper.submitAllAndAwaitAggregatingExceptions(...) either ... i'm
pretty sure if you check the diffs of all the methods where you are using it in
your PR not all of them currently ignore null results.
{quote}
this part, I am not sold on. looked at all uses again and the large majority
filter nulls, one would NPE and the shards ones have no nulls. but I do see the
point that a generic method does not need to concern itself with null values,
it feels very 'implementation dependent'
thanks for the review so far, will update and ping again
[edit] ninja edit. I updated the PR with all of the above except null filtering
at the call location. I can add it tomorrow morning.
was (Author: alex.parvulescu):
{quote}
I like the impl of {{SolrClientCache.close()}} a lot ... strictly speaking i
don't think {{AtomicBoolean}} is actually necessary given that (all?) the
methods are synchronized, but i don't care. i prefer your AtomicBoolean
approach over depending on the synchronization (maybe/hopefully enough of the
methods can similarly be improved to the point we can remove the
synchronization some day)
{quote}
you are correct the atomic boolean is not strictly needed.
{quote}
checkState(); is too vague of a method name to be readable in context w/o
consulting javados. The convention in lucene is {{ensureOpen(); }} which i
think is really self documenting as you skim code
{quote}
yeah, will move to ensureOpen, sounds better.
{quote}
I think having a
StreamExecutorHelper.submitAllAndAwaitAggregatingExceptions(...) method that is
re-used in multiple streams is a fine idea, but I think the "guts" of it should
be re-implemented as an ExecutorUtil class that takes an existing
ExecutorService as an argument (and does not close it even on exception)
{quote}
I agree. will move it there
{quote}
but the if (result != null) ... logic should not be part of a general
ExecutorUtil helper
arguably it shouldn't be part of
StreamExecutorHelper.submitAllAndAwaitAggregatingExceptions(...) either ... i'm
pretty sure if you check the diffs of all the methods where you are using it in
your PR not all of them currently ignore null results.
{quote}
this part, I am not sold on. looked at all uses again and the large majority
filter nulls, one would NPE and the shards ones have no nulls. but I do see the
point that a generic method does not need to concern itself with null values,
it feels very 'implementation dependent'
thanks for the review so far, will update and ping again
> Non-reproducible StreamingTest failures -- suggests CloudSolrStream
> concurency race condition
> ---------------------------------------------------------------------------------------------
>
> Key: SOLR-16992
> URL: https://issues.apache.org/jira/browse/SOLR-16992
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Chris M. Hostetter
> Assignee: Alex Deparvu
> Priority: Major
> Attachments:
> OUTPUT-org.apache.solr.client.solrj.io.stream.StreamingTest.txt,
> thetaphi_solr_Solr-main-Linux_14679.log.txt
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Roughly 3% of all jenkins jobs that run {{StreamingTest}} wind up having
> suite level failures.
> These failures have historically taken the form of
> {{com.carrotsearch.randomizedtesting.ThreadLeakError}} and the leaked threads
> all have names like
> {{"h2sc-718-thread-2"}} indicating that they come from the internal
> {{ExecutorService}} of an {{{}Http2SolrClient{}}}.
> In my experience, the seeds from these failures have never reproduced -
> suggesting that the problem is related to concurrency.
> SOLR-16983 restored the (correct) use of {{ObjectReleaseTracker}} which in
> theory should help pinpoint where {{Http2SolrClient}} instances might not be
> getting closed (by causing {{ObjectReleaseTracker}} to fail with stacktraces
> of when/where any unclosed instances were created - ie: which test method)
> In practice, I have managed to force one failure from {{StreamingTest}} since
> the SOLR-16983 changes (logs to be attached soon) - but it still didn't
> indicate any leaked/unclosed {{Http2SolrClient}} instances. What it instead
> indicated was a _single_ unclosed {{InputStream}} instance related to
> {{Http2SolrClient}} connections (SOLR-16983 also added better tracking of
> this) coming from {{StreamingTest.testExceptionStream}} - a test method that
> opens _five_ very similar {{ExceptionStream}} instances, wrapping
> {{CloudSolrStream}} instance, which expect to trigger server side errors.
> By it's very design, {{ExceptionStream}} catches & records any exceptions
> from the stream it wraps, so even in the event of these "expected" server
> side errors, {{ExceptionStream.close()}} should still be correctly getting
> called (and propagating down to the {{CloudStream}} it wraps).
> I believe the underlying problem has to do with a concurrency race condition
> between the call to {{CloudStream.close()}} and the {{ExecutorService}} used
> internally by {{CloudSolrStream.openStreams()}} (details to follow)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]