*TLDR:* I've got a working branch
<https://github.com/gus-asf/solr/tree/servlet-solr> where CoreContainer &
our startup process is extracted from SolrDispatch Filter. Do other folks
think that is interesting enough that I should make a JIRA and/or PR with
intention to make this change?

*Details:*

Jetty is a servlet container, yet we more or less ignore it by stuffing
everything into a single filter (almost, admin UI is served separately).
I'm sure there are lots of historical reasons for this, probably including
servlet containers and their specs were much less mature when solr was
first started. Maybe also the early authors were more focused on making
search work than leveraging what the container could do for them (but I
wasn't there for that so that's just a guess).

The result is that we have a couple of very large classes that are touched
by almost every request, and they have a lot of conditional logic trying to
decide what the user is asking. Normally this sort of dispatch is done by
the container based on the request URL to direct it to the appropriate
servlet. Solr does a LOT of different things so this code is extremely
tricky and complex to understand. Specifically, I'm speaking of
SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
two classes probably makes them harder to understand.

It seems to me (and this mail is asking if you agree) that these classes
are long overdue for some subdivision. The most obvious thing to pull out
is all the admin calls. Admin code paths really have little or nothing to
do with query or update code paths since there are no documents to route or
sub-requests to some subset of nodes.

The primary obstacle to any such separation and simplification is that most
requests have some interaction with CoreContainer, and the things it holds,
and this is initialized and held by a field in SolrDispatchFilter. After
spending a significant chunk of time reading this code in the prior weeks
and a timely and motivating conversation with Eric Pugh, I dumped a chunk
of my weekend into an experiment to see if I could pull CoreContainer out
of the dispatch filter, and leverage the facilities of our servlet
container.

That wasn't too terribly hard, but keeping JettySolrRunner happy was very
confusing, and worrisome since I've realized it's not respecting our
web.xml at all, and any configuration in web.xml needs to be duplicated for
our tests in JettySolrRunner (tangent alert)

The result is that CoreContainer is now held by a class called CoreService
(please help me name things if you don't like my names :) ). CoreService is
a ServletContextListener, appropriately configured in web.xml, and has a
static method that can be used to get a reference to the CoreContainer
corresponding to the ServletContext in which code wanting a core container
is running (this supports having multiple JettySolrRunners in tests, though
probably never has more than one CoreContainer in the running application)

I achieved this in 4 stages shown here:
https://github.com/gus-asf/solr/tree/servlet-solr

Ignore the AdminServlet class, it's a placeholder, and can be subtracted
without harm.

Since the current state of the code in that branch is apparently
test-stable (4 runs of check in a row passing, none slower than any run of
3 runs of main, both as low as 10.5 min if I don't continue working on the
machine)...

Do we want to push this refactor in now to avoid making a huge ball of
changes that gets harder and harder to merge? The next push point would
probably be when AdminServlet was functional (if/when that happens) (and we
could not push that class for now).

If you read this far, thanks :) I wasn't sure how feasible this would be so
I felt the need to prove it to my self in code before wasting your time,
but please don't hesitate to point to costs I might be missing or anything
that looks like a mistake, or say why this was a total waste of time :)

Thoughts?

-Gus
-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Reply via email to