*TLDR:* I've got a working branch <https://github.com/gus-asf/solr/tree/servlet-solr> where CoreContainer & our startup process is extracted from SolrDispatch Filter. Do other folks think that is interesting enough that I should make a JIRA and/or PR with intention to make this change?
*Details:* Jetty is a servlet container, yet we more or less ignore it by stuffing everything into a single filter (almost, admin UI is served separately). I'm sure there are lots of historical reasons for this, probably including servlet containers and their specs were much less mature when solr was first started. Maybe also the early authors were more focused on making search work than leveraging what the container could do for them (but I wasn't there for that so that's just a guess). The result is that we have a couple of very large classes that are touched by almost every request, and they have a lot of conditional logic trying to decide what the user is asking. Normally this sort of dispatch is done by the container based on the request URL to direct it to the appropriate servlet. Solr does a LOT of different things so this code is extremely tricky and complex to understand. Specifically, I'm speaking of SolrDispatchFilter and HttpSolrCall, which are so inseparable that being two classes probably makes them harder to understand. It seems to me (and this mail is asking if you agree) that these classes are long overdue for some subdivision. The most obvious thing to pull out is all the admin calls. Admin code paths really have little or nothing to do with query or update code paths since there are no documents to route or sub-requests to some subset of nodes. The primary obstacle to any such separation and simplification is that most requests have some interaction with CoreContainer, and the things it holds, and this is initialized and held by a field in SolrDispatchFilter. After spending a significant chunk of time reading this code in the prior weeks and a timely and motivating conversation with Eric Pugh, I dumped a chunk of my weekend into an experiment to see if I could pull CoreContainer out of the dispatch filter, and leverage the facilities of our servlet container. That wasn't too terribly hard, but keeping JettySolrRunner happy was very confusing, and worrisome since I've realized it's not respecting our web.xml at all, and any configuration in web.xml needs to be duplicated for our tests in JettySolrRunner (tangent alert) The result is that CoreContainer is now held by a class called CoreService (please help me name things if you don't like my names :) ). CoreService is a ServletContextListener, appropriately configured in web.xml, and has a static method that can be used to get a reference to the CoreContainer corresponding to the ServletContext in which code wanting a core container is running (this supports having multiple JettySolrRunners in tests, though probably never has more than one CoreContainer in the running application) I achieved this in 4 stages shown here: https://github.com/gus-asf/solr/tree/servlet-solr Ignore the AdminServlet class, it's a placeholder, and can be subtracted without harm. Since the current state of the code in that branch is apparently test-stable (4 runs of check in a row passing, none slower than any run of 3 runs of main, both as low as 10.5 min if I don't continue working on the machine)... Do we want to push this refactor in now to avoid making a huge ball of changes that gets harder and harder to merge? The next push point would probably be when AdminServlet was functional (if/when that happens) (and we could not push that class for now). If you read this far, thanks :) I wasn't sure how feasible this would be so I felt the need to prove it to my self in code before wasting your time, but please don't hesitate to point to costs I might be missing or anything that looks like a mistake, or say why this was a total waste of time :) Thoughts? -Gus -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
