Re: Major proposal: NamedList in Solr 10

David Smiley Mon, 23 Dec 2024 09:42:02 -0800

I'm sympathetic to your questioning of NamedList's alleged performance
benefits (or even NamedList's very existence) but I don't see why that
would have any coupling to my proposal.  Unless you think we shouldn't
bother doing anything drastic/sweeping (possibly breaking users) if it's
likely we don't even want it?  If we did move away from it, then in a
number of places we would have to enforce uniqueness in configuration that
currently has no such constraint (think init(NamedList) on all plugins read
from solrconfig.xml).  At least my proposal is fairly mechanical & simple
and a real improvement both to intentionality in the code using it and
flipping some responses that should have been map-like in JSON but aren't.
I don't think it conflicts with your aims.  I'm sure you can appreciate
that it will take time to get away from using NamedList, so let's do some
improvements like this.


On Mon, Dec 23, 2024 at 11:22 AM Gus Heck <[email protected]> wrote:

> With respect to the effect on JSON output I'm for that. A single
> dimensional Array encoding a 2D array to communicate key/value pairs is
> very contrary to any programmer's expectations (at least within this
> century). So improving the rendering of our JSON at some point: definitely
> +1 While we are at it, let's fix the cases that emit JSON objects with
> duplicate keys!
>
> The next major thought I have is that this sort of effort should be coupled
> with a very clear, modern demonstration of the actual performance benefits
> of NamedList/SimpleOrderedMap. As I have noted elsewhere
> <
> https://issues.apache.org/jira/browse/SOLR-912?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17897044#comment-17897044
> >
> all of its other claimed benefits seem to be untrue. I would pit
> SimpleOrderedMap against java.util.HashMap. I have a strong suspicion that
> the only case that SOM might win is in the case of entirely novel keys
> where the string hashcode needs to be calculated fresh for each insertion.
> In any case with a string used multiple times, that hashcode is cached, and
> I expect the JVM wizards at sun/oracle will have strategies for identifying
> interning strings used frequently. But what's the magnitude of that
> benefit? Is it worth it? Will that benefit outweigh optimizations Jackson
> might have for handling maps (certainly a core use case for them)?
>
> Even if NamedList was a GREAT idea in January of 2006 when it was added,
> Almost 20 years of library and JVM development may well have made it
> obsolete (or not! the point is we don't know)
>
> So to me the first step of this is verifying that SOM is really what we
> want in the first place.
>
>
> On Fri, Dec 20, 2024 at 7:01 PM David Smiley <[email protected]> wrote:
>
> > Problem: Today, lots of Solr code will create a NamedList and forget to
> > consider creating a SimpleOrderedMap (subclassing NamedList) instead.
> The
> > vast majority of NamedLists I've seen *should* be a SimpleOrderedMap but
> > were not created as such.  The distinction is highly subtle, affecting
> how
> > Solr serializes a NamedList to JSON -- whether it should render it as a
> Map
> > or another strategy dependent on the json.nl parameter.  The subtle-ness
> > means lack of testing and ease of breaking compatibility.  And not using
> > SimpleOrderedMap when we should is annoying to a JSON consumer who then
> has
> > to parse it weirdly, maybe flip-flopping with json.nl.
> >
> > Proposal: Strongly differentiate creation of NamedList instances between
> > SimpleOrderedMap (exists), and a new subclass to convey that keys may
> > repeat.  NamedList will become abstract and gain some factory methods to
> > instantiate one of these concisely that basically everyone will use.  The
> > exact naming is TBD for JIRA.  Adding the factory methods and the type
> can
> > come to 9.9 if a user wants to start using them, but making it abstract
> and
> > making a sweeping change across the codebase is 10 only.  The sweeping
> > changes will *not* change any declared parameters/fields/variables to be
> > different from NamedList to a specific type, thus the change won't be too
> > huge.  Fortunately, most/all of the javabin consuming code won't have a
> > compatibility problem since NamedLists that become a SimpleOrderedMap are
> > still nonetheless a NamedList.  But users requesting JSON will in many
> > cases find the JSON structure of many of Solr's APIs to have changed, and
> > I'm not sure we can enumerate them.  This can only be done in a major
> > release -- Solr 10, especially so sweeping of a change.  Are we okay with
> > this?
> >
> > I know many of us have expressed a distaste for NamedList generally.
> There
> > are several old JIRA issues about switching away from NamedList in many
> > places.  I imagine a distant Solr 11 with V2 complete (100% new v2 not
> old
> > v2) and V1 gone, there will be far fewer NamedLists running around thanks
> > to embracing annotated classes with JSON serialization instead.  Still,
> > NamedList should not be frozen, waiting for that future to unfold.
> >
> > For reference: all unresolved JIRA issues with "NamedList" in the
> summary:
> >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20summary%20~%20NamedList%20AND%20resolution%20is%20EMPTY%20ORDER%20BY%20created%20DESC
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
>
>
> --
> http://www.needhamsoftware.com (work)
> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>

Re: Major proposal: NamedList in Solr 10

Reply via email to