[
https://issues.apache.org/jira/browse/SOLR-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768110#comment-17768110
]
Chris M. Hostetter edited comment on SOLR-16992 at 9/22/23 5:57 PM:
--------------------------------------------------------------------
{quote}given the proliferation of this pattern across multiple classes, does it
make sense to move this to a common util class and call it everywhere? not sure
if possible but it would help nail it down once for all cases.
{quote}
yeah ... maybe some sort of {{public static <T> List<T>
submitAllAndAwaitAggregatingExceptions(ExecutorService exec, List<Callable<T>>
tasks) throws SolrException}} that returns the list of all the results from the
{{Future.get()}} calls or throws a {{SolrException}} if one of more {{get()}}
calls throws an exception, with the first exception encountered as the
{{getCause()}} and the rest as {{getSuppressed()}}
{quote}it feels like there is some overlap between these 2 ideas. not sure how
the impls would vary but I like waiting for everything to finish and discarding
any useless results (prioritize the exception).
{quote}
Hmmm... i'm not really sure how they would overlap?
It seems to me we either:
* wait for every Future, aggregated the results (and or throw an aggregated
exceptions), then shutdown
OR
* as soon as any {{Future.get()}} call throws an exception, call
{{shutdownNow}} (in place of the current {{{}shutdown{}}}) then
{{awaitTermination()}} to make sure any currently executing {{StreamOpen}}
instances are done, so ther is no risk they are still running when we throw
that exception we caught
{quote}> Change SolrClientCache so that any method that will add to solrClients
throws an IllegalStateException if isClosed
-0 it doesn't feel like this class is a correct place to tackle this.
{quote}
I don't really understand your response (particularly given your responses to
my later ideas) so i'm suspicious that i was vague to the point of confusing
you.
what i'm suggesting is that if this sequence of code happens...
{code:java}
SolrClientCache cache = new SolrClientCache();
cache.close();
if (cloudMode) {
cache.getCloudSolrClient(zkHost); // this should throw IllegalStateException
} else {
cache.getHttpSolrClient(url); // and/or this should throw
IllegalStateException
}
{code}
{quote}I think it can be legal to call close twice (treat it as an idempotent
operation), but not open after close that feels like a strong breach of api
contract.
{quote}
correct. the {{Closeable.close()}} must be idempotent, but other methods in
{{Closeable}} impls are allowed to behave any way we want if they are caled
after {{close()}}
{quote}Another idea can we shortcut `StreamOpener` to bail early if stream is
closed? so we avoid some noise in the logs.
{quote}
Even if we re-wrote {{StreamOpener}} to look something like this, there would
still be a concurrency race condition...
{code:java}
@Override
public TupleWrapper call() throws Exception {
if (stream.isClosed()) {
throw new Exception("Stream closed");
} // RACE CONDITION: at this point some other thread could call
stream.close()
stream.open();
TupleWrapper wrapper = new TupleWrapper(stream, comp);
...
{code}
checking "isClosed ?" really needs to happen internally to the logic in the
{{TupleStream}} instances, and it either needs to happen in blocks
{{syncrhonized}} by the same gate, or using a {{private volatile boolean
isClosed}}
was (Author: hossman):
{quote}given the proliferation of this pattern across multiple classes, does it
make sense to move this to a common util class and call it everywhere? not sure
if possible but it would help nail it down once for all cases.
{quote}
yeah ... maybe some sort of {{public static <T> List<T>
submitAllAndAwaitAggregatingExceptions(ExecutorService exec, List<Callable<T>>
tasks) throws SolrException}} that returns the list of all the results from the
{{Future.get()}} calls or throws a {{SolrException}} if one of more {{get()}}
calls throws an exception, with the first exception encountered as the
{{getCause()}} and the rest as {{getSuppressed()}}
{quote}it feels like there is some overlap between these 2 ideas. not sure how
the impls would vary but I like waiting for everything to finish and discarding
any useless results (prioritize the exception).
{quote}
Hmmm... i'm not really sure how they would overlap?
It seems to me we either:
* wait for every Future, aggregated the results (and or throw an aggregated
exceptions), then shutdown
OR
* as soon as any {{Future.get()}} call throws an exception, call
{{shutdownNow}} (in place of the current {{{}shutdown{}}}) then
{{awaitTermination()}} to make sure any currently executing {{StreamOpen}}
instances are done, so ther is no risk they are still running when we throw
that exception we caught
{quote}> Change SolrClientCache so that any method that will add to solrClients
throws an IllegalStateException if isClosed
-0 it doesn't feel like this class is a correct place to tackle this.
{quote}
I don't really understand your response (particularly given your responses to
my later ideas) so i'm suspicious that i was vague to the point of confusing
you.IllegalStateException
what i'm suggesting is that if this sequence of code happens...
{code:java}
SolrClientCache cache = new SolrClientCache();
cache.close();
if (cloudMode) {
cache.getCloudSolrClient(zkHost); // this should throw IllegalStateException
} else {
cache.getHttpSolrClient(url); // and/or this should throw
IllegalStateException
}
{code}
{quote}I think it can be legal to call close twice (treat it as an idempotent
operation), but not open after close that feels like a strong breach of api
contract.
{quote}
correct. the {{Closeable.close()}} must be idempotent, but other methods in
{{Closeable}} impls are allowed to behave any way we want if they are caled
after {{close()}}
{quote}Another idea can we shortcut `StreamOpener` to bail early if stream is
closed? so we avoid some noise in the logs.
{quote}
Even if we re-wrote {{StreamOpener}} to look something like this, there would
still be a concurrency race condition...
{code:java}
@Override
public TupleWrapper call() throws Exception {
if (stream.isClosed()) {
throw new Exception("Stream closed");
} // RACE CONDITION: at this point some other thread could call
stream.close()
stream.open();
TupleWrapper wrapper = new TupleWrapper(stream, comp);
...
{code}
checking "isClosed ?" really needs to happen internally to the logic in the
{{TupleStream}} instances, and it either needs to happen in blocks
{{syncrhonized}} by the same gate, or using a {{private volatile boolean
isClosed}}
> Non-reproducible StreamingTest failures -- suggests CloudSolrStream
> concurency race condition
> ---------------------------------------------------------------------------------------------
>
> Key: SOLR-16992
> URL: https://issues.apache.org/jira/browse/SOLR-16992
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Chris M. Hostetter
> Priority: Major
> Attachments:
> OUTPUT-org.apache.solr.client.solrj.io.stream.StreamingTest.txt,
> thetaphi_solr_Solr-main-Linux_14679.log.txt
>
>
> Roughly 3% of all jenkins jobs that run {{StreamingTest}} wind up having
> suite level failures.
> These failures have historically taken the form of
> {{com.carrotsearch.randomizedtesting.ThreadLeakError}} and the leaked threads
> all have names like
> {{"h2sc-718-thread-2"}} indicating that they come from the internal
> {{ExecutorService}} of an {{{}Http2SolrClient{}}}.
> In my experience, the seeds from these failures have never reproduced -
> suggesting that the problem is related to concurrency.
> SOLR-16983 restored the (correct) use of {{ObjectReleaseTracker}} which in
> theory should help pinpoint where {{Http2SolrClient}} instances might not be
> getting closed (by causing {{ObjectReleaseTracker}} to fail with stacktraces
> of when/where any unclosed instances were created - ie: which test method)
> In practice, I have managed to force one failure from {{StreamingTest}} since
> the SOLR-16983 changes (logs to be attached soon) - but it still didn't
> indicate any leaked/unclosed {{Http2SolrClient}} instances. What it instead
> indicated was a _single_ unclosed {{InputStream}} instance related to
> {{Http2SolrClient}} connections (SOLR-16983 also added better tracking of
> this) coming from {{StreamingTest.testExceptionStream}} - a test method that
> opens _five_ very similar {{ExceptionStream}} instances, wrapping
> {{CloudSolrStream}} instance, which expect to trigger server side errors.
> By it's very design, {{ExceptionStream}} catches & records any exceptions
> from the stream it wraps, so even in the event of these "expected" server
> side errors, {{ExceptionStream.close()}} should still be correctly getting
> called (and propagating down to the {{CloudStream}} it wraps).
> I believe the underlying problem has to do with a concurrency race condition
> between the call to {{CloudStream.close()}} and the {{ExecutorService}} used
> internally by {{CloudSolrStream.openStreams()}} (details to follow)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]