Re: Solr and Jetty and Servlets

Gus Heck Tue, 17 Aug 2021 08:12:15 -0700

*ServletContextHandler not ServletContext

On Tue, Aug 17, 2021 at 11:03 AM Gus Heck <[email protected]> wrote:


> Ok nope, since that only happens with WebAppContext, not ServletContext
> (the parent class). whew :)
>
> On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <[email protected]> wrote:
>
>> WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and
>> thus lack of metadata-complete=true in web.xml means that we wind up
>> scanning for annotations?
>>
>> On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <[email protected]> wrote:
>>
>>> https://issues.apache.org/jira/browse/SOLR-15590
>>>
>>> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <[email protected]> wrote:
>>>
>>>> Ok well it would be interesting to compare quickstarting vs
>>>> JettySolrRunner, and reading up on quickstart
>>>> <https://webtide.com/jetty-9-quick-start/> gives ideas about
>>>> pre-building the quickstart xml, but web.xml for JettySolrRunner is a
>>>> tangent. Either way we are still in a container and I think I hear some
>>>> agreement that something should be done about the dispatch here, and both
>>>> of you seem to agree that an actual servlet would make more sense than a
>>>> filter, so I'll make a ticket/pr to make it easier to track & easier to
>>>> read the code.
>>>>
>>>> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <[email protected]>
>>>> wrote:
>>>>
>>>>> The downside to respecting web.xml and making JettySolrServer serve a
>>>>> webapp is that loading a webapp is very expensive and slow in comparison.
>>>>> JettySolrServer actually starts up extremely quickly. It’s almost more
>>>>> appealing to change the Server to use the JettySolrServer strategy. It’s 
>>>>> so
>>>>> slow to load a webapp because of all the things it needs to support and
>>>>> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty
>>>>> QuickStart does improve the situation if used at least. But there is 
>>>>> really
>>>>> no need to eat the whole webapp standard for a non webapp app to have 
>>>>> Jetty
>>>>> manage more of the dispatching. I always wondering about the motivation /
>>>>> upside of hacking in a straight servlet myself. But for a non webapp stuck
>>>>> in a servlet container, it’s actually a beautiful move that side steps a
>>>>> bunch of slow crap even better than the QuickStart pushes ever will, and
>>>>> still allows for matching any support we would want.
>>>>>
>>>>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I
>>>>> made a slim SolrCall base class that each extends. All that logic is hairy
>>>>> enough to follow without them interleaving and sharing in a kind of wild
>>>>> collaboration. It’s pretty unfortunate they both still exist.
>>>>>
>>>>> You are right, leaving that path in that state is a poor idea. If
>>>>> there was anything that could be considered the “hot” path, it’s right
>>>>> there - feverishly checking and ripping up streams for anyone and 
>>>>> anything.
>>>>> Is a core? A collection? A handler? Maybe a V2 handler? What I’d we cut 
>>>>> the
>>>>> path down? Do a dance? Treat the string like a date and see if that works.
>>>>>
>>>>> So yeah, agreed, nobody does or would dispatch this way, you have to
>>>>> frog boil into it. It’s slow, almost incomprehensible, and incredibly good
>>>>> at holding onto life, even building on it. But being a webapp and parsing
>>>>> web.xml has little to do with dispatching and having sensible http api 
>>>>> that
>>>>> doesn’t work like it’s a perl compiler.
>>>>>
>>>>> Anyway, I can’t imagine trying to trace through that code with any
>>>>> solid feeling about having a handle on it via inspection. But just like
>>>>> Perl, if you debug / trace log the hell out of it, it is all actually
>>>>> pretty simple to adjust and reign in.
>>>>>
>>>>>
>>>>> MRM
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <[email protected]> wrote:
>>>>>
>>>>>> So I'm not interested in *adhering* to anything here, just using
>>>>>> stuff where it helps... I see tools sitting on the shelf, already bought
>>>>>> and paid for and carried with us to every job but then they sit in the 
>>>>>> back
>>>>>> of the truck mostly unused... and (I think) they look useful.
>>>>>>
>>>>>> This should not be seen as in any way advocating a return to war file
>>>>>> deployment or "choose your container". All of what I would suggest should
>>>>>> likely be compatible with a jetty startup controlled by our code (I 
>>>>>> assume
>>>>>> we can convince it to read web.xml properly when doing so). Certainly 
>>>>>> point
>>>>>> out anything that would interfere with that.
>>>>>>
>>>>>> Here we sit in a container that is a tool box containing:
>>>>>>
>>>>>>    - A nice mechanism for initializing stuff at the application
>>>>>>    level, (ServletContextListener) and a well defined guarantee that it 
>>>>>> runs
>>>>>>    before any filter or servlet,
>>>>>>    - An automatic shutdown mechanism that it will call for us on
>>>>>>    graceful shutdown (ServletContextListner again).
>>>>>>    - A nice layered filter chain mechanism which could have been
>>>>>>    used to layer in things like tracing and authentication, and close 
>>>>>> shields
>>>>>>    etc as small succinct filters rather than weaving them into an ever 
>>>>>> more
>>>>>>    complex filter & call class.
>>>>>>    - In more recent versions of the spec, for listeners defined in
>>>>>>    web.xml the order is also guaranteed.
>>>>>>    - Servlet classes that are *already* set up to distinguish http
>>>>>>    verbs automatically when desired
>>>>>>
>>>>>> So why is this better? Because monster classes that do a hundred
>>>>>> things are really hard to understand and maintain. Small methods, small
>>>>>> classes whenever possible. I also suspect that there may be some gains in
>>>>>> performance to be had if we rely more on the container (which will 
>>>>>> already
>>>>>> be dispatching based on path) to choose our code paths (at least at a
>>>>>> coarse level) and then have less custom dispatch logic executed on 
>>>>>> *every*
>>>>>> request
>>>>>>
>>>>>> Obviously I'm wrong and if the net result is less performant to any
>>>>>> significant degree forget it that wouldn't be worth it. (wanted:
>>>>>> standardized solr benchmarks)
>>>>>>
>>>>>> There WILL be complications with v2 because it is a subclass of
>>>>>> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
>>>>>> should be a separate servlet from v1, but we don't want to duplicate code
>>>>>> either... so work to do there...
>>>>>>
>>>>>> I think an incremental approach is necessary since very few of us
>>>>>> have the bandwidth to more than that and certainly it becomes difficult 
>>>>>> to
>>>>>> find anyone with the time to review large changes. What I have thus far 
>>>>>> is
>>>>>> stable with respect to the tests, and simplifies some stuff already which
>>>>>> is why I chose this point to start the discussion
>>>>>>
>>>>>> But yeah, look at what I did and say what you like and what you
>>>>>> don't. :).
>>>>>>
>>>>>> -Gus
>>>>>>
>>>>>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Sounds like an interesting adventure last weekend.
>>>>>>>
>>>>>>> I'm unclear what the point of going this direction is; my instinct
>>>>>>> is to go the opposite direction.  You seem to suggest there are some
>>>>>>> simplification/organization benefits, which I love, so I'll need to 
>>>>>>> look at
>>>>>>> what you've done to judge that for myself.  Yes Jetty supports the 
>>>>>>> Servlet
>>>>>>> spec but we need not embrace it.  Adhering to that is useful if you 
>>>>>>> have a
>>>>>>> generic web app that can be deployed to a container of the user's
>>>>>>> convenience/choosing.  No doubt this is why Solr started this way, and 
>>>>>>> why
>>>>>>> the apps I built in my early days adhered to that spec.  But since 6.0, 
>>>>>>> we
>>>>>>> view Solr as self-contained and more supportable if the project makes 
>>>>>>> these
>>>>>>> decisions and thus not needlessly constrain itself as well.
>>>>>>>
>>>>>>> It is super weird to me that SolrDispatchFilter is a Servlet
>>>>>>> *Filter* and not a *Servlet* itself.
>>>>>>>
>>>>>>> Also, I suspect there may be complications in changes here relating
>>>>>>> to Solr's v1 vs v2 API.  And most definitely also what you discovered --
>>>>>>> JettySolrRunner.
>>>>>>>
>>>>>>> ~ David Smiley
>>>>>>> Apache Lucene/Solr Search Developer
>>>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> *TLDR:* I've got a working branch
>>>>>>>> <https://github.com/gus-asf/solr/tree/servlet-solr> where
>>>>>>>> CoreContainer & our startup process is extracted from SolrDispatch 
>>>>>>>> Filter.
>>>>>>>> Do other folks think that is interesting enough that I should make a 
>>>>>>>> JIRA
>>>>>>>> and/or PR with intention to make this change?
>>>>>>>>
>>>>>>>> *Details:*
>>>>>>>>
>>>>>>>> Jetty is a servlet container, yet we more or less ignore it by
>>>>>>>> stuffing everything into a single filter (almost, admin UI is served
>>>>>>>> separately). I'm sure there are lots of historical reasons for this,
>>>>>>>> probably including servlet containers and their specs were much less 
>>>>>>>> mature
>>>>>>>> when solr was first started. Maybe also the early authors were more 
>>>>>>>> focused
>>>>>>>> on making search work than leveraging what the container could do for 
>>>>>>>> them
>>>>>>>> (but I wasn't there for that so that's just a guess).
>>>>>>>>
>>>>>>>> The result is that we have a couple of very large classes that are
>>>>>>>> touched by almost every request, and they have a lot of conditional 
>>>>>>>> logic
>>>>>>>> trying to decide what the user is asking. Normally this sort of 
>>>>>>>> dispatch is
>>>>>>>> done by the container based on the request URL to direct it to the
>>>>>>>> appropriate servlet. Solr does a LOT of different things so this code 
>>>>>>>> is
>>>>>>>> extremely tricky and complex to understand. Specifically, I'm speaking 
>>>>>>>> of
>>>>>>>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that 
>>>>>>>> being
>>>>>>>> two classes probably makes them harder to understand.
>>>>>>>>
>>>>>>>> It seems to me (and this mail is asking if you agree) that these
>>>>>>>> classes are long overdue for some subdivision. The most obvious thing 
>>>>>>>> to
>>>>>>>> pull out is all the admin calls. Admin code paths really have little or
>>>>>>>> nothing to do with query or update code paths since there are no 
>>>>>>>> documents
>>>>>>>> to route or sub-requests to some subset of nodes.
>>>>>>>>
>>>>>>>> The primary obstacle to any such separation and simplification is
>>>>>>>> that most requests have some interaction with CoreContainer, and the 
>>>>>>>> things
>>>>>>>> it holds, and this is initialized and held by a field in
>>>>>>>> SolrDispatchFilter. After spending a significant chunk of time reading 
>>>>>>>> this
>>>>>>>> code in the prior weeks and a timely and motivating conversation with 
>>>>>>>> Eric
>>>>>>>> Pugh, I dumped a chunk of my weekend into an experiment to see if I 
>>>>>>>> could
>>>>>>>> pull CoreContainer out of the dispatch filter, and leverage the 
>>>>>>>> facilities
>>>>>>>> of our servlet container.
>>>>>>>>
>>>>>>>> That wasn't too terribly hard, but keeping JettySolrRunner happy
>>>>>>>> was very confusing, and worrisome since I've realized it's not 
>>>>>>>> respecting
>>>>>>>> our web.xml at all, and any configuration in web.xml needs to be 
>>>>>>>> duplicated
>>>>>>>> for our tests in JettySolrRunner (tangent alert)
>>>>>>>>
>>>>>>>> The result is that CoreContainer is now held by a class called
>>>>>>>> CoreService (please help me name things if you don't like my names :) 
>>>>>>>> ).
>>>>>>>> CoreService is a ServletContextListener, appropriately configured in
>>>>>>>> web.xml, and has a static method that can be used to get a reference 
>>>>>>>> to the
>>>>>>>> CoreContainer corresponding to the ServletContext in which code 
>>>>>>>> wanting a
>>>>>>>> core container is running (this supports having multiple 
>>>>>>>> JettySolrRunners
>>>>>>>> in tests, though probably never has more than one CoreContainer in the
>>>>>>>> running application)
>>>>>>>>
>>>>>>>> I achieved this in 4 stages shown here:
>>>>>>>> https://github.com/gus-asf/solr/tree/servlet-solr
>>>>>>>>
>>>>>>>> Ignore the AdminServlet class, it's a placeholder, and can be
>>>>>>>> subtracted without harm.
>>>>>>>>
>>>>>>>> Since the current state of the code in that branch is apparently
>>>>>>>> test-stable (4 runs of check in a row passing, none slower than any 
>>>>>>>> run of
>>>>>>>> 3 runs of main, both as low as 10.5 min if I don't continue working on 
>>>>>>>> the
>>>>>>>> machine)...
>>>>>>>>
>>>>>>>> Do we want to push this refactor in now to avoid making a huge ball
>>>>>>>> of changes that gets harder and harder to merge? The next push point 
>>>>>>>> would
>>>>>>>> probably be when AdminServlet was functional (if/when that happens) 
>>>>>>>> (and we
>>>>>>>> could not push that class for now).
>>>>>>>>
>>>>>>>> If you read this far, thanks :) I wasn't sure how feasible this
>>>>>>>> would be so I felt the need to prove it to my self in code before 
>>>>>>>> wasting
>>>>>>>> your time, but please don't hesitate to point to costs I might be 
>>>>>>>> missing
>>>>>>>> or anything that looks like a mistake, or say why this was a total 
>>>>>>>> waste of
>>>>>>>> time :)
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> -Gus
>>>>>>>> --
>>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>>> http://www.the111shift.com (play)
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> http://www.needhamsoftware.com (work)
>>>>>> http://www.the111shift.com (play)
>>>>>>
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://about.me/markrmiller
>>>>>
>>>>
>>>>
>>>> --
>>>> http://www.needhamsoftware.com (work)
>>>> http://www.the111shift.com (play)
>>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Reply via email to