[
https://issues.apache.org/jira/browse/SOLR-18236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079891#comment-18079891
]
David Smiley commented on SOLR-18236:
-------------------------------------
+I developed this problem statement and plan with Claude and I certainly
reviewed everything. Didn't validate every claim but overall makes sense with
what I've seen. I don't have conviction of all elements of the plan. For
example, I'd prefer the changes here for 10.1 with no need for waiting till 11.+
h2. 1. Proposed Vision
Three exception types, three distinct concepts. Each carries its weight; none
lies about what it represents.
h3. IOException — local / transport failure
Reserved for genuine I/O at the transport layer that prevents a
request-response exchange from completing (socket closed mid-read, EOF before
headers, connection refused).
*Critical rule:* stop wrapping every IOException in SolrServerException. Let it
propagate. This is the original, correct intent of the +throws IOException+
clause.
h3. SolrServerException — client-side orchestration / pre-flight failure
Failures that happen on the client side, _before or instead of_ a successful
HTTP exchange:
* No live servers in the LB pool
* Time/processing limits exceeded inside the LB or async client
* Retries exhausted in CloudSolrClient
* Missing default collection (configuration / pre-flight validation)
* Connection refused (logical "I couldn't reach a server"), URI syntax errors
* Timeout waiting for headers (request never produced a server response)
Stays a checked {+}Exception{+}. Its semantics narrow: "I couldn't get a
response from a Solr server."
h3. RemoteSolrException — server replied with an HTTP error
Re-parented under SolrServerException. Becomes {*}checked{*}. Carries: +code()+
(HTTP status), {+}getMetadata(){+}, {+}getDetails(){+},
{+}getRemoteErrorObject(){+}, {+}shouldSkipRetry(){+}.
Triggered iff the server returned a non-2xx status code. Connection-refused /
timeout / parse-failure cases move out of this class into SolrServerException
or IOException.
h3. Resulting SolrClient.request signature
{code:java}
public abstract NamedList<Object> request(SolrRequest<?> req, String coll)
throws SolrServerException, IOException;
{code}
Unchanged signature. Javadoc now means what it says. A single +catch
(SolrServerException)+ covers both client-orchestration failures and
server-replied-with-error.
{noformat}
Throwable
├─ Exception
│ ├─ IOException (transport)
│ └─ SolrServerException (client orchestration; checked)
│ └─ RemoteSolrException (server returned HTTP error;
checked)
└─ RuntimeException
└─ SolrException (shared error w/ HTTP-code shape;
unchanged)
{noformat}
h3. SolrException — the shared HTTP-shaped error type
SolrException lives in +org.apache.solr.common+ (intentionally shared between
server and SolrJ). There are ~88 +throw new SolrException+ sites inside the
SolrJ build:
* ~70 in +o.a.s.common.*+ shared utilities (SolrParams, Aliases, ClusterState,
DocRouter, FacetParams, NamedList, Utils, etc.)
* ~15 in +o.a.s.client.solrj.*+ proper (pre-flight validation, response parser
failures, CloudSolrClient cluster-state checks)
*Decision: keep SolrException structurally unchanged.* SolrException and
SolrServerException are parallel hierarchies:
* *SolrException* — "I'm encountering an error; here's its HTTP-shape."
Unchecked. Throwable from anywhere — server handlers, shared validators,
pre-flight checks.
* *SolrServerException* — "I, as a SolrClient, cannot deliver a response to my
caller." Checked. Post-{+}request(){+} API contract.
RemoteSolrException bridges both by duplicating SolrException's
+code/metadata/details+ accessors while subclassing SolrServerException.
*Mandatory: fix the missing javadoc on SolrException* (currently just {+}/**
*/{+}). Document its role, cross-reference RemoteSolrException and
SolrServerException, document {+}code(){+}, {+}getMetadata(){+},
{+}getDetails(){+}, MDC logging helpers.
*Out of scope (future cleanup):* The ~15 strict-client SolrException throws in
+o.a.s.client.solrj.*+ (pre-flight → SolrServerException, parser failures →
IOException) each flip unchecked → checked at caller signatures — separate
breaking-change pass.
----
h2. 2. Changes
h3. 2a. Type hierarchy
*RemoteSolrException* — re-parent from +extends SolrException+ to {+}extends
SolrServerException{+}. Loses inherited
{+}code(){+}/{+}metadata{+}/{+}details{+} accessors; duplicate those fields and
accessors directly on RemoteSolrException.
*SolrServerException* — add a constructor accepting a cause Throwable plus HTTP
code (so RemoteSolrException can chain through).
h3. 2b. Stop wrapping IOException in SolrServerException
* *HttpJettySolrClient.java:495-514* — drop the IOException →
SolrServerException translation; let IOExceptions propagate. ConnectException:
one explicit translation kept. Timeout: stays SolrServerException.
* *HttpJdkSolrClient.java:182-260* — same audit. RuntimeException wrap at line
199: keep (genuinely surprising). URISyntaxException at line 259: keep as
SolrServerException (pre-flight).
* *HttpSolrClientBase.processErrorsAndResponse:299-300* — IOException reading
response body should propagate as IOException, not be re-thrown as
RemoteSolrException.
* *ConcurrentUpdateBaseSolrClient.java* — similar audit.
h3. 2c. Server-error path becomes checked
All +throw new RemoteSolrException(...)+ sites in HttpSolrClientBase and
ConcurrentUpdateBaseSolrClient now throw a checked exception.
+processErrorsAndResponse+ already declares {+}throws SolrServerException{+},
which now covers RemoteSolrException. No signature changes propagate further up
the call chain.
h3. 2d. Javadoc rewrite
SolrClient.java — update every method's javadoc:
{noformat}
@throws IOException low-level transport failure (socket, parse)
@throws SolrServerException client-side failure (no live server, timeout,
retries exhausted),
OR server returned an HTTP error (a RemoteSolrException); inspect the
type
and call code() to distinguish.
{noformat}
Update class javadocs on SolrException, SolrServerException,
RemoteSolrException.
h3. 2e. Files to modify
# +solr/solrj/src/java/org/apache/solr/client/solrj/RemoteSolrException.java+
— re-parent, duplicate fields
# +solr/solrj/src/java/org/apache/solr/client/solrj/SolrServerException.java+
— add constructors, javadoc
# +solr/solrj/src/java/org/apache/solr/common/SolrException.java+ — write the
missing class javadoc
# +solr/solrj/src/java/org/apache/solr/client/solrj/SolrClient.java+ — javadoc
on every throws-bearing method
#
+solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClientBase.java+
— IOException pass-through in processErrorsAndResponse
#
+solr/solrj-jetty/src/java/org/apache/solr/client/solrj/jetty/HttpJettySolrClient.java+
— drop IOException wrapping in request
#
+solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpJdkSolrClient.java+
— same audit
#
+solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateBaseSolrClient.java+
— same
# +solr/CHANGES.txt+ — migration entry
----
h2. 3. Backwards-Compatibility Concerns
h3. 3a. Source-compat for callers — mostly safe
||Pattern||Sites||Result after change||
|+catch (SolrServerException \| IOException e)+|52 in tree + unknown
third-party|*No change* — RSE caught by SolrServerException arm|
|+catch (SolrServerException e)+ alone|11 in solrj|Now also catches RSE.
Improvement: callers were discarding HTTP-code anyway.|
|+catch (RemoteSolrException e)+|45 in solrj|*No change* — still compiles,
behavior unchanged|
|+catch (SolrException e)+ expecting to catch RSE|unknown|*Breaks* — RSE no
longer extends SolrException|
h3. 3b. Source-compat for propagators — the actual breakage
Code that propagates RemoteSolrException through a method *without* a +throws
SolrServerException+ declaration will fail to compile:
* +Runnable.run()+ bodies issuing SolrJ calls
* Lambda callbacks, async handlers, executor tasks
* Methods with no +throws+ clause currently letting RSE escape unannounced
These sites must add a +throws+ clause, wrap, or catch. Audit pass needed
across the codebase. Third-party callers need a migration note.
h3. 3c. Wire compatibility
The wire payload between server and SolrJ client is {*}unchanged{*}.
RemoteSolrException is constructed locally from parsed response bytes.
SolrException is unchanged — server-side error reporting (ResponseUtils,
BinaryResponseWriter) is untouched.
h3. 3d. Migration plan — two majors
*Solr 10.x (non-breaking groundwork):*
# Add new constructors on SolrServerException needed by RemoteSolrException.
# Rewrite javadocs on SolrClient, RemoteSolrException, SolrServerException,
SolrException.
# Stop wrapping IOException in SolrServerException in the four impl files.
(Observable but defensible; CHANGES.txt entry required.)
# Internal audit: find all sites in solr/ that propagate RemoteSolrException
without a +throws+ clause. Add explicit clauses or catch blocks.
*Solr 11.0 (structural break): _{color:#ffab00}or we just do this in
10.x{color}_*
# Re-parent {+}RemoteSolrException extends SolrServerException{+}. Duplicate
code/metadata/details fields directly on RemoteSolrException.
# CHANGES.txt + upgrade-notes entry: "RemoteSolrException is now checked. Code
that previously relied on it being a RuntimeException to propagate without a
+throws+ clause will no longer compile. Add +throws SolrServerException+ or
catch."
h3. 3e. Risks not worth taking
* *Don't* make SolrException extend SolrServerException — flips all of
SolrException from unchecked to checked. Catastrophic for server-side handlers
and +o.a.s.common+ validators.
* *Don't* make RemoteSolrException extend IOException — calling a 404 an "I/O
error" is the lie this cleanup is fixing.
* *Don't* rename anything — stable public API; deprecation cycle for cosmetic
value is not worth it.
h2. Verification
* Test: request to a server returning 500 — assert RemoteSolrException is
caught by +catch (SolrServerException e)+ and +e.code() == 500+
* Test: request to an unreachable port — assert raw IOException propagates
without SolrServerException wrapping
* Audit: +grep -r "catch (SolrException" solr/+ — find sites that relied on
RSE being a SolrException
* Audit: search lambdas/Runnables propagating RSE implicitly
> SolrJ exception hiearchy needs work
> -----------------------------------
>
> Key: SOLR-18236
> URL: https://issues.apache.org/jira/browse/SOLR-18236
> Project: Solr
> Issue Type: Improvement
> Components: SolrJ
> Reporter: David Smiley
> Priority: Major
>
> +SolrClient.request()+ declares +throws SolrServerException, IOException+
> with javadoc claiming:
> * *IOException* — "communication error with the server"
> * *SolrServerException* — "error on the server"
> Reality is the inverse on both axes:
> ||Scenario||Actual exception||Checked?||
> |Server returns non-2xx|RemoteSolrException (extends SolrException →
> RuntimeException)|*unchecked*|
> |Response body read fails (true I/O error)|RemoteSolrException|*unchecked*|
> |Server timeout / connection refused|SolrServerException (wraps the
> IOException)|checked|
> |No live servers / retries exhausted|SolrServerException|checked|
> |Raw IOException propagated as-is|almost never|—|
> h3. Contradictions
> # "Error on the server" (4xx/5xx reply) surfaces as *unchecked*
> RemoteSolrException — the javadoc says it should be SolrServerException.
> Compiler cannot help callers handle the most common failure mode.
> # "Communication error" rarely surfaces as IOException. Every concrete
> SolrClient wraps IOException (ConnectException, socket timeout, etc.) into
> SolrServerException. The +throws IOException+ clause is essentially dead.
> # HttpSolrClientBase.processErrorsAndResponse catches a real I/O error
> reading the response body and re-throws it as RemoteSolrException — calling a
> transport failure a "remote error."
> # 52 call sites just do +catch (SolrServerException | IOException e)+ and
> treat them identically — the split has no observable value at most call sites.
> # Yet 45 sites specifically catch RemoteSolrException to inspect +e.code()+
> — the HTTP-status-bearing exception is the genuinely useful concept, and it
> is the one the type system hides.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]