https://issues.apache.org/jira/browse/SOLR-15590
On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <[email protected]> wrote: > Ok well it would be interesting to compare quickstarting vs > JettySolrRunner, and reading up on quickstart > <https://webtide.com/jetty-9-quick-start/> gives ideas about pre-building > the quickstart xml, but web.xml for JettySolrRunner is a tangent. Either > way we are still in a container and I think I hear some agreement that > something should be done about the dispatch here, and both of you seem to > agree that an actual servlet would make more sense than a filter, so I'll > make a ticket/pr to make it easier to track & easier to read the code. > > On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <[email protected]> wrote: > >> The downside to respecting web.xml and making JettySolrServer serve a >> webapp is that loading a webapp is very expensive and slow in comparison. >> JettySolrServer actually starts up extremely quickly. It’s almost more >> appealing to change the Server to use the JettySolrServer strategy. It’s so >> slow to load a webapp because of all the things it needs to support and >> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty >> QuickStart does improve the situation if used at least. But there is really >> no need to eat the whole webapp standard for a non webapp app to have Jetty >> manage more of the dispatching. I always wondering about the motivation / >> upside of hacking in a straight servlet myself. But for a non webapp stuck >> in a servlet container, it’s actually a beautiful move that side steps a >> bunch of slow crap even better than the QuickStart pushes ever will, and >> still allows for matching any support we would want. >> >> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I >> made a slim SolrCall base class that each extends. All that logic is hairy >> enough to follow without them interleaving and sharing in a kind of wild >> collaboration. It’s pretty unfortunate they both still exist. >> >> You are right, leaving that path in that state is a poor idea. If there >> was anything that could be considered the “hot” path, it’s right there - >> feverishly checking and ripping up streams for anyone and anything. Is a >> core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path >> down? Do a dance? Treat the string like a date and see if that works. >> >> So yeah, agreed, nobody does or would dispatch this way, you have to frog >> boil into it. It’s slow, almost incomprehensible, and incredibly good at >> holding onto life, even building on it. But being a webapp and parsing >> web.xml has little to do with dispatching and having sensible http api that >> doesn’t work like it’s a perl compiler. >> >> Anyway, I can’t imagine trying to trace through that code with any solid >> feeling about having a handle on it via inspection. But just like Perl, if >> you debug / trace log the hell out of it, it is all actually pretty simple >> to adjust and reign in. >> >> >> MRM >> >> >> >> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <[email protected]> wrote: >> >>> So I'm not interested in *adhering* to anything here, just using stuff >>> where it helps... I see tools sitting on the shelf, already bought and paid >>> for and carried with us to every job but then they sit in the back of the >>> truck mostly unused... and (I think) they look useful. >>> >>> This should not be seen as in any way advocating a return to war file >>> deployment or "choose your container". All of what I would suggest should >>> likely be compatible with a jetty startup controlled by our code (I assume >>> we can convince it to read web.xml properly when doing so). Certainly point >>> out anything that would interfere with that. >>> >>> Here we sit in a container that is a tool box containing: >>> >>> - A nice mechanism for initializing stuff at the application level, >>> (ServletContextListener) and a well defined guarantee that it runs before >>> any filter or servlet, >>> - An automatic shutdown mechanism that it will call for us on >>> graceful shutdown (ServletContextListner again). >>> - A nice layered filter chain mechanism which could have been used >>> to layer in things like tracing and authentication, and close shields etc >>> as small succinct filters rather than weaving them into an ever more >>> complex filter & call class. >>> - In more recent versions of the spec, for listeners defined in >>> web.xml the order is also guaranteed. >>> - Servlet classes that are *already* set up to distinguish http >>> verbs automatically when desired >>> >>> So why is this better? Because monster classes that do a hundred things >>> are really hard to understand and maintain. Small methods, small classes >>> whenever possible. I also suspect that there may be some gains in >>> performance to be had if we rely more on the container (which will already >>> be dispatching based on path) to choose our code paths (at least at a >>> coarse level) and then have less custom dispatch logic executed on *every* >>> request >>> >>> Obviously I'm wrong and if the net result is less performant to any >>> significant degree forget it that wouldn't be worth it. (wanted: >>> standardized solr benchmarks) >>> >>> There WILL be complications with v2 because it is a subclass of >>> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it >>> should be a separate servlet from v1, but we don't want to duplicate code >>> either... so work to do there... >>> >>> I think an incremental approach is necessary since very few of us have >>> the bandwidth to more than that and certainly it becomes difficult to find >>> anyone with the time to review large changes. What I have thus far is >>> stable with respect to the tests, and simplifies some stuff already which >>> is why I chose this point to start the discussion >>> >>> But yeah, look at what I did and say what you like and what you don't. >>> :). >>> >>> -Gus >>> >>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <[email protected]> wrote: >>> >>>> Sounds like an interesting adventure last weekend. >>>> >>>> I'm unclear what the point of going this direction is; my instinct is >>>> to go the opposite direction. You seem to suggest there are some >>>> simplification/organization benefits, which I love, so I'll need to look at >>>> what you've done to judge that for myself. Yes Jetty supports the Servlet >>>> spec but we need not embrace it. Adhering to that is useful if you have a >>>> generic web app that can be deployed to a container of the user's >>>> convenience/choosing. No doubt this is why Solr started this way, and why >>>> the apps I built in my early days adhered to that spec. But since 6.0, we >>>> view Solr as self-contained and more supportable if the project makes these >>>> decisions and thus not needlessly constrain itself as well. >>>> >>>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter* >>>> and not a *Servlet* itself. >>>> >>>> Also, I suspect there may be complications in changes here relating to >>>> Solr's v1 vs v2 API. And most definitely also what you discovered -- >>>> JettySolrRunner. >>>> >>>> ~ David Smiley >>>> Apache Lucene/Solr Search Developer >>>> http://www.linkedin.com/in/davidwsmiley >>>> >>>> >>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <[email protected]> wrote: >>>> >>>>> *TLDR:* I've got a working branch >>>>> <https://github.com/gus-asf/solr/tree/servlet-solr> where >>>>> CoreContainer & our startup process is extracted from SolrDispatch Filter. >>>>> Do other folks think that is interesting enough that I should make a JIRA >>>>> and/or PR with intention to make this change? >>>>> >>>>> *Details:* >>>>> >>>>> Jetty is a servlet container, yet we more or less ignore it by >>>>> stuffing everything into a single filter (almost, admin UI is served >>>>> separately). I'm sure there are lots of historical reasons for this, >>>>> probably including servlet containers and their specs were much less >>>>> mature >>>>> when solr was first started. Maybe also the early authors were more >>>>> focused >>>>> on making search work than leveraging what the container could do for them >>>>> (but I wasn't there for that so that's just a guess). >>>>> >>>>> The result is that we have a couple of very large classes that are >>>>> touched by almost every request, and they have a lot of conditional logic >>>>> trying to decide what the user is asking. Normally this sort of dispatch >>>>> is >>>>> done by the container based on the request URL to direct it to the >>>>> appropriate servlet. Solr does a LOT of different things so this code is >>>>> extremely tricky and complex to understand. Specifically, I'm speaking of >>>>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being >>>>> two classes probably makes them harder to understand. >>>>> >>>>> It seems to me (and this mail is asking if you agree) that these >>>>> classes are long overdue for some subdivision. The most obvious thing to >>>>> pull out is all the admin calls. Admin code paths really have little or >>>>> nothing to do with query or update code paths since there are no documents >>>>> to route or sub-requests to some subset of nodes. >>>>> >>>>> The primary obstacle to any such separation and simplification is that >>>>> most requests have some interaction with CoreContainer, and the things it >>>>> holds, and this is initialized and held by a field in SolrDispatchFilter. >>>>> After spending a significant chunk of time reading this code in the prior >>>>> weeks and a timely and motivating conversation with Eric Pugh, I dumped a >>>>> chunk of my weekend into an experiment to see if I could pull >>>>> CoreContainer >>>>> out of the dispatch filter, and leverage the facilities of our servlet >>>>> container. >>>>> >>>>> That wasn't too terribly hard, but keeping JettySolrRunner happy was >>>>> very confusing, and worrisome since I've realized it's not respecting our >>>>> web.xml at all, and any configuration in web.xml needs to be duplicated >>>>> for >>>>> our tests in JettySolrRunner (tangent alert) >>>>> >>>>> The result is that CoreContainer is now held by a class called >>>>> CoreService (please help me name things if you don't like my names :) ). >>>>> CoreService is a ServletContextListener, appropriately configured in >>>>> web.xml, and has a static method that can be used to get a reference to >>>>> the >>>>> CoreContainer corresponding to the ServletContext in which code wanting a >>>>> core container is running (this supports having multiple JettySolrRunners >>>>> in tests, though probably never has more than one CoreContainer in the >>>>> running application) >>>>> >>>>> I achieved this in 4 stages shown here: >>>>> https://github.com/gus-asf/solr/tree/servlet-solr >>>>> >>>>> Ignore the AdminServlet class, it's a placeholder, and can be >>>>> subtracted without harm. >>>>> >>>>> Since the current state of the code in that branch is apparently >>>>> test-stable (4 runs of check in a row passing, none slower than any run of >>>>> 3 runs of main, both as low as 10.5 min if I don't continue working on the >>>>> machine)... >>>>> >>>>> Do we want to push this refactor in now to avoid making a huge ball of >>>>> changes that gets harder and harder to merge? The next push point would >>>>> probably be when AdminServlet was functional (if/when that happens) (and >>>>> we >>>>> could not push that class for now). >>>>> >>>>> If you read this far, thanks :) I wasn't sure how feasible this would >>>>> be so I felt the need to prove it to my self in code before wasting your >>>>> time, but please don't hesitate to point to costs I might be missing or >>>>> anything that looks like a mistake, or say why this was a total waste of >>>>> time :) >>>>> >>>>> Thoughts? >>>>> >>>>> -Gus >>>>> -- >>>>> http://www.needhamsoftware.com (work) >>>>> http://www.the111shift.com (play) >>>>> >>>> >>> >>> -- >>> http://www.needhamsoftware.com (work) >>> http://www.the111shift.com (play) >>> >> -- >> - Mark >> >> http://about.me/markrmiller >> > > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play) > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
