Chris M. Hostetter created SOLR-15589:
-----------------------------------------
Summary: DocRouter API needs redesigned to support stricter
routing rules
Key: SOLR-15589
URL: https://issues.apache.org/jira/browse/SOLR-15589
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter
This issue i being opened to track some of the things observed in SOLR-8889
about how/why the existing DocRouter API (and it's usages in solrj/solr) make
it so hard to prevent/identify bugs like that.
The hope is that someone with some time/energy/interest might wish to pursue
redesigning the DocRouter API to be more "strict" so that any future bugs like
SOLR-8889 will cause "fail fast" behavior because of missing info that the
router knows is required, instead of the current behavior of sending updates to
the wrong shard due to poor assumptions...
{panel:title=Comments from SOLR-8889}
*3)* There does not appear to be any existing logic to “fail hard” in Solr if
an update (add or delete) for a collection using {{router.field}} is processed
w/o the field (or explicit {{\_route\_}} param). Instead the update is
silently sent to the wrong shard. (there might be a check/failure situation for
this in the add/update case, but I don’t see it offhand, and certainly not in
the deleteById case)
...
... looking into what it would take to "fix" #3 in some way – perhaps along the
lines of Yonik's comment in SOLR-5890 suggesting a "strict" mode when using
{{router.field}}) – because that seems like it would be the most robust way we
can be "future proof" against any bugs (like #1 & #2 above) in solrj/solr-core
code silently sending updates to the wrong shard leaders.
Unfortunately it doesn’t seem very viable to fix #3 w/o a LOT of serious
changes to the DocRouter API. The general categories of problems I’m seeing are:
* Sloppy/Loose API Contract
** some methods are documented to return “null” if there is not enough details
to determine the correct Slice for an update, but some impls throw exceptions
instead (some of the Update Chain processing code actually expects
{{DocRouter}} to throw exceptions in spite of the API documentation
** some methods take in a “route” argument in addition to an "id" argument ...
but not all methods
** some callers act aware of when a “route” has been specified by the user,
and pass that the {{DocRouter}} methods as the "id" argument in place of the
document's uniqueKey _instead_ of passing it where the “route” arg is suppose
to be passed – so the DocRouter impls never really know whether the “id” they
have been given is from the uniqueKey field or from the {{router.field}}
** Some code in Solr's Update Chain assumes a {{CompositeIdRouter}} is in use,
and directly calls internal (but public) methods from the impl (which are not
part of the DocRouter API) that don’t have access to the {{router.field}} let
alone know if the input is a uniqueKey value or a {{router.field}} value (and
worse: some of this code silently does nothing if some other DocRouter is used)
* Overly Permissive Functional Design:
** IIUC it actually seems like {{router.field}} was designed with the
expectation that if an "add" command includes a SolrInputDocument that doesn’t
include a value in the router field, then the uniqueKey should be used as a
callback to decide which shard to send it to. (ie: documents aren't required to
have a value in the {{router.field}}
Ultimately i think the entire DocRouter API (and it’s usage in Solr) should be
reconsidered, with a (strawman) goal of adding a {{strict}} option that (when
true for a collection, and in some future version of solr it should default to
true) would cause a DocRouter with {{router.field}} set to throw an Exception
anytime any {{DocRouter}} caller calls any routing related method w/o
specifying a value for the {{router.field}}. Unfortunately this seems like it
would take more time/effort then I have to focus on it right now, so I plan to
spin it off into it's own Improvement jira.
{panel}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]