*ServletContextHandler not ServletContext On Tue, Aug 17, 2021 at 11:03 AM Gus Heck <[email protected]> wrote:
> Ok nope, since that only happens with WebAppContext, not ServletContext > (the parent class). whew :) > > On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <[email protected]> wrote: > >> WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and >> thus lack of metadata-complete=true in web.xml means that we wind up >> scanning for annotations? >> >> On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <[email protected]> wrote: >> >>> https://issues.apache.org/jira/browse/SOLR-15590 >>> >>> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <[email protected]> wrote: >>> >>>> Ok well it would be interesting to compare quickstarting vs >>>> JettySolrRunner, and reading up on quickstart >>>> <https://webtide.com/jetty-9-quick-start/> gives ideas about >>>> pre-building the quickstart xml, but web.xml for JettySolrRunner is a >>>> tangent. Either way we are still in a container and I think I hear some >>>> agreement that something should be done about the dispatch here, and both >>>> of you seem to agree that an actual servlet would make more sense than a >>>> filter, so I'll make a ticket/pr to make it easier to track & easier to >>>> read the code. >>>> >>>> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <[email protected]> >>>> wrote: >>>> >>>>> The downside to respecting web.xml and making JettySolrServer serve a >>>>> webapp is that loading a webapp is very expensive and slow in comparison. >>>>> JettySolrServer actually starts up extremely quickly. It’s almost more >>>>> appealing to change the Server to use the JettySolrServer strategy. It’s >>>>> so >>>>> slow to load a webapp because of all the things it needs to support and >>>>> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty >>>>> QuickStart does improve the situation if used at least. But there is >>>>> really >>>>> no need to eat the whole webapp standard for a non webapp app to have >>>>> Jetty >>>>> manage more of the dispatching. I always wondering about the motivation / >>>>> upside of hacking in a straight servlet myself. But for a non webapp stuck >>>>> in a servlet container, it’s actually a beautiful move that side steps a >>>>> bunch of slow crap even better than the QuickStart pushes ever will, and >>>>> still allows for matching any support we would want. >>>>> >>>>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I >>>>> made a slim SolrCall base class that each extends. All that logic is hairy >>>>> enough to follow without them interleaving and sharing in a kind of wild >>>>> collaboration. It’s pretty unfortunate they both still exist. >>>>> >>>>> You are right, leaving that path in that state is a poor idea. If >>>>> there was anything that could be considered the “hot” path, it’s right >>>>> there - feverishly checking and ripping up streams for anyone and >>>>> anything. >>>>> Is a core? A collection? A handler? Maybe a V2 handler? What I’d we cut >>>>> the >>>>> path down? Do a dance? Treat the string like a date and see if that works. >>>>> >>>>> So yeah, agreed, nobody does or would dispatch this way, you have to >>>>> frog boil into it. It’s slow, almost incomprehensible, and incredibly good >>>>> at holding onto life, even building on it. But being a webapp and parsing >>>>> web.xml has little to do with dispatching and having sensible http api >>>>> that >>>>> doesn’t work like it’s a perl compiler. >>>>> >>>>> Anyway, I can’t imagine trying to trace through that code with any >>>>> solid feeling about having a handle on it via inspection. But just like >>>>> Perl, if you debug / trace log the hell out of it, it is all actually >>>>> pretty simple to adjust and reign in. >>>>> >>>>> >>>>> MRM >>>>> >>>>> >>>>> >>>>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <[email protected]> wrote: >>>>> >>>>>> So I'm not interested in *adhering* to anything here, just using >>>>>> stuff where it helps... I see tools sitting on the shelf, already bought >>>>>> and paid for and carried with us to every job but then they sit in the >>>>>> back >>>>>> of the truck mostly unused... and (I think) they look useful. >>>>>> >>>>>> This should not be seen as in any way advocating a return to war file >>>>>> deployment or "choose your container". All of what I would suggest should >>>>>> likely be compatible with a jetty startup controlled by our code (I >>>>>> assume >>>>>> we can convince it to read web.xml properly when doing so). Certainly >>>>>> point >>>>>> out anything that would interfere with that. >>>>>> >>>>>> Here we sit in a container that is a tool box containing: >>>>>> >>>>>> - A nice mechanism for initializing stuff at the application >>>>>> level, (ServletContextListener) and a well defined guarantee that it >>>>>> runs >>>>>> before any filter or servlet, >>>>>> - An automatic shutdown mechanism that it will call for us on >>>>>> graceful shutdown (ServletContextListner again). >>>>>> - A nice layered filter chain mechanism which could have been >>>>>> used to layer in things like tracing and authentication, and close >>>>>> shields >>>>>> etc as small succinct filters rather than weaving them into an ever >>>>>> more >>>>>> complex filter & call class. >>>>>> - In more recent versions of the spec, for listeners defined in >>>>>> web.xml the order is also guaranteed. >>>>>> - Servlet classes that are *already* set up to distinguish http >>>>>> verbs automatically when desired >>>>>> >>>>>> So why is this better? Because monster classes that do a hundred >>>>>> things are really hard to understand and maintain. Small methods, small >>>>>> classes whenever possible. I also suspect that there may be some gains in >>>>>> performance to be had if we rely more on the container (which will >>>>>> already >>>>>> be dispatching based on path) to choose our code paths (at least at a >>>>>> coarse level) and then have less custom dispatch logic executed on >>>>>> *every* >>>>>> request >>>>>> >>>>>> Obviously I'm wrong and if the net result is less performant to any >>>>>> significant degree forget it that wouldn't be worth it. (wanted: >>>>>> standardized solr benchmarks) >>>>>> >>>>>> There WILL be complications with v2 because it is a subclass of >>>>>> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it >>>>>> should be a separate servlet from v1, but we don't want to duplicate code >>>>>> either... so work to do there... >>>>>> >>>>>> I think an incremental approach is necessary since very few of us >>>>>> have the bandwidth to more than that and certainly it becomes difficult >>>>>> to >>>>>> find anyone with the time to review large changes. What I have thus far >>>>>> is >>>>>> stable with respect to the tests, and simplifies some stuff already which >>>>>> is why I chose this point to start the discussion >>>>>> >>>>>> But yeah, look at what I did and say what you like and what you >>>>>> don't. :). >>>>>> >>>>>> -Gus >>>>>> >>>>>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Sounds like an interesting adventure last weekend. >>>>>>> >>>>>>> I'm unclear what the point of going this direction is; my instinct >>>>>>> is to go the opposite direction. You seem to suggest there are some >>>>>>> simplification/organization benefits, which I love, so I'll need to >>>>>>> look at >>>>>>> what you've done to judge that for myself. Yes Jetty supports the >>>>>>> Servlet >>>>>>> spec but we need not embrace it. Adhering to that is useful if you >>>>>>> have a >>>>>>> generic web app that can be deployed to a container of the user's >>>>>>> convenience/choosing. No doubt this is why Solr started this way, and >>>>>>> why >>>>>>> the apps I built in my early days adhered to that spec. But since 6.0, >>>>>>> we >>>>>>> view Solr as self-contained and more supportable if the project makes >>>>>>> these >>>>>>> decisions and thus not needlessly constrain itself as well. >>>>>>> >>>>>>> It is super weird to me that SolrDispatchFilter is a Servlet >>>>>>> *Filter* and not a *Servlet* itself. >>>>>>> >>>>>>> Also, I suspect there may be complications in changes here relating >>>>>>> to Solr's v1 vs v2 API. And most definitely also what you discovered -- >>>>>>> JettySolrRunner. >>>>>>> >>>>>>> ~ David Smiley >>>>>>> Apache Lucene/Solr Search Developer >>>>>>> http://www.linkedin.com/in/davidwsmiley >>>>>>> >>>>>>> >>>>>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> *TLDR:* I've got a working branch >>>>>>>> <https://github.com/gus-asf/solr/tree/servlet-solr> where >>>>>>>> CoreContainer & our startup process is extracted from SolrDispatch >>>>>>>> Filter. >>>>>>>> Do other folks think that is interesting enough that I should make a >>>>>>>> JIRA >>>>>>>> and/or PR with intention to make this change? >>>>>>>> >>>>>>>> *Details:* >>>>>>>> >>>>>>>> Jetty is a servlet container, yet we more or less ignore it by >>>>>>>> stuffing everything into a single filter (almost, admin UI is served >>>>>>>> separately). I'm sure there are lots of historical reasons for this, >>>>>>>> probably including servlet containers and their specs were much less >>>>>>>> mature >>>>>>>> when solr was first started. Maybe also the early authors were more >>>>>>>> focused >>>>>>>> on making search work than leveraging what the container could do for >>>>>>>> them >>>>>>>> (but I wasn't there for that so that's just a guess). >>>>>>>> >>>>>>>> The result is that we have a couple of very large classes that are >>>>>>>> touched by almost every request, and they have a lot of conditional >>>>>>>> logic >>>>>>>> trying to decide what the user is asking. Normally this sort of >>>>>>>> dispatch is >>>>>>>> done by the container based on the request URL to direct it to the >>>>>>>> appropriate servlet. Solr does a LOT of different things so this code >>>>>>>> is >>>>>>>> extremely tricky and complex to understand. Specifically, I'm speaking >>>>>>>> of >>>>>>>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that >>>>>>>> being >>>>>>>> two classes probably makes them harder to understand. >>>>>>>> >>>>>>>> It seems to me (and this mail is asking if you agree) that these >>>>>>>> classes are long overdue for some subdivision. The most obvious thing >>>>>>>> to >>>>>>>> pull out is all the admin calls. Admin code paths really have little or >>>>>>>> nothing to do with query or update code paths since there are no >>>>>>>> documents >>>>>>>> to route or sub-requests to some subset of nodes. >>>>>>>> >>>>>>>> The primary obstacle to any such separation and simplification is >>>>>>>> that most requests have some interaction with CoreContainer, and the >>>>>>>> things >>>>>>>> it holds, and this is initialized and held by a field in >>>>>>>> SolrDispatchFilter. After spending a significant chunk of time reading >>>>>>>> this >>>>>>>> code in the prior weeks and a timely and motivating conversation with >>>>>>>> Eric >>>>>>>> Pugh, I dumped a chunk of my weekend into an experiment to see if I >>>>>>>> could >>>>>>>> pull CoreContainer out of the dispatch filter, and leverage the >>>>>>>> facilities >>>>>>>> of our servlet container. >>>>>>>> >>>>>>>> That wasn't too terribly hard, but keeping JettySolrRunner happy >>>>>>>> was very confusing, and worrisome since I've realized it's not >>>>>>>> respecting >>>>>>>> our web.xml at all, and any configuration in web.xml needs to be >>>>>>>> duplicated >>>>>>>> for our tests in JettySolrRunner (tangent alert) >>>>>>>> >>>>>>>> The result is that CoreContainer is now held by a class called >>>>>>>> CoreService (please help me name things if you don't like my names :) >>>>>>>> ). >>>>>>>> CoreService is a ServletContextListener, appropriately configured in >>>>>>>> web.xml, and has a static method that can be used to get a reference >>>>>>>> to the >>>>>>>> CoreContainer corresponding to the ServletContext in which code >>>>>>>> wanting a >>>>>>>> core container is running (this supports having multiple >>>>>>>> JettySolrRunners >>>>>>>> in tests, though probably never has more than one CoreContainer in the >>>>>>>> running application) >>>>>>>> >>>>>>>> I achieved this in 4 stages shown here: >>>>>>>> https://github.com/gus-asf/solr/tree/servlet-solr >>>>>>>> >>>>>>>> Ignore the AdminServlet class, it's a placeholder, and can be >>>>>>>> subtracted without harm. >>>>>>>> >>>>>>>> Since the current state of the code in that branch is apparently >>>>>>>> test-stable (4 runs of check in a row passing, none slower than any >>>>>>>> run of >>>>>>>> 3 runs of main, both as low as 10.5 min if I don't continue working on >>>>>>>> the >>>>>>>> machine)... >>>>>>>> >>>>>>>> Do we want to push this refactor in now to avoid making a huge ball >>>>>>>> of changes that gets harder and harder to merge? The next push point >>>>>>>> would >>>>>>>> probably be when AdminServlet was functional (if/when that happens) >>>>>>>> (and we >>>>>>>> could not push that class for now). >>>>>>>> >>>>>>>> If you read this far, thanks :) I wasn't sure how feasible this >>>>>>>> would be so I felt the need to prove it to my self in code before >>>>>>>> wasting >>>>>>>> your time, but please don't hesitate to point to costs I might be >>>>>>>> missing >>>>>>>> or anything that looks like a mistake, or say why this was a total >>>>>>>> waste of >>>>>>>> time :) >>>>>>>> >>>>>>>> Thoughts? >>>>>>>> >>>>>>>> -Gus >>>>>>>> -- >>>>>>>> http://www.needhamsoftware.com (work) >>>>>>>> http://www.the111shift.com (play) >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> http://www.needhamsoftware.com (work) >>>>>> http://www.the111shift.com (play) >>>>>> >>>>> -- >>>>> - Mark >>>>> >>>>> http://about.me/markrmiller >>>>> >>>> >>>> >>>> -- >>>> http://www.needhamsoftware.com (work) >>>> http://www.the111shift.com (play) >>>> >>> >>> >>> -- >>> http://www.needhamsoftware.com (work) >>> http://www.the111shift.com (play) >>> >> >> >> -- >> http://www.needhamsoftware.com (work) >> http://www.the111shift.com (play) >> > > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play) > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
