[
https://issues.apache.org/jira/browse/SOLR-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083884#comment-14083884
]
Shai Erera edited comment on SOLR-912 at 8/3/14 6:21 AM:
---------------------------------------------------------
Resurrecting that issue, I reviewed NamedList and I don't understand why it has
to be so complicated:
* Its {{<T>}} generic results in TONS of warnings in eclipse, for no good
reason. I don't buy this comment made on the issue that it's better to make it
generic than not. If we have a generic API which isn't used like that, it calls
for a bad API IMO. From what I can see, NamedList is not supposed to be generic
at all, as its main purpose is to allow you to store a heterogeneous list of
{{<name,value>}} pairs, where name is always String, and the type of {{value}}
may differ. If we want to make it more convenient for people who know e.g. all
values are Strings, we can add sugar methods like {{getInt(), getString()...}}.
I've also briefly reviewed some classes that use NamedList (outside of tests),
and none seem to rely on {{<T>}} at all. So I'd rather we remove that generic
from the API signature.
* There is what seems to be a totally redundant {{SimpleOrderedMap}} class,
which has contradicting documentation in its class-jdocs:
** _a {{NamedList}} where access by key is more important than maintaining
order_
** _This class does not provide efficient lookup by key_
But the class doesn't add any additional data structures, contains only 3 ctors
which delegate as-is to NamedList and offers a clone() which is identical to
NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think
this class is just confusing and has to go away.
Leaving performance aside for a second, NamedList could simply hold an internal
{{Map<String,List<Object>>}} to enable efficient access by key, remove all
values of a key, access a key's values in order etc. It doesn't allow accessing
the {{<name,value>}} pairs in any order though, i.e. {{getVal(i)}}. I don't
know how important is this functionality though .. i.e. if we replaced it with
a {{namesIterator()}}, would it not allow roughly the same thing? I'm kind of
sure it does, but there are so many uses of NamedList across the Solr code base
that I might be missing a case which won't like it. So I'd like to ask the Solr
folks who know this code better than me -- how important is {{getName/Val(i)}}?
Now back to performance, for a sec, in order to not always allocate a
{{List<Object>}} when NamedList is used to hold only one value per parameter,
we can either:
* Use Collections.singletonList() on first _add_, and change to a concrete List
on the second _add_ only.
* Use an {{Object[]}}, it's less expensive than a List object.
* Use a Map<String,Object> internally and do instanceof checks on add/get as
appropriate.
BUT, if accessing/removing values by name is not important and it's OK if
get(i) is run on O(N), we can simply simplify the class, like Yonik's proposal
above, to hold an Object[] array (instead of List). But I think we should
remove the generic anyway.
Maybe we should break this down into 3 issues:
* Get rid of SimpleOrderedMap -- if it's important to keep in 4x, I can
deprecate and move all uses of it to NamedList directly.
* Remove the generics from NamedList's API. We can add sugar getters for
specific types if we want.
* Simplify NamedList internal implementation. On the performance side -- how
critical is NamedList on the execution path? I don't like micro-benchmarks too
much, so if NamedList is only a fraction of an entire execution path, I'd
rather it's even a tad slower but readable and easier to use/maintain, than if
it's overly complicated only to buy us a few nanos in the overall request.
was (Author: shaie):
Resurrecting that issue, I reviewed NamedList and I don't understand why it has
to be so complicated:
* Its {{<T>}} generic cause nothing but TONS of warnings in eclipse, for no
good reason. I don't buy this comment made on the issue that it's better to
make it generic than not. If we have a generic API which isn't used like that,
it calls for a bad API IMO. From what I can see, NamedList is not supposed to
be generic at all, as its main purpose is to allow you to store a heterogeneous
list of {{<name,value>}} pairs, where name is always String, and the type of
{{value}} may differ. If we want to make it more convenient for people who know
e.g. all values are Strings, we can add sugar methods like {{getInt(),
getString()...}}. I've also briefly reviewed some classes that use NamedList
(outside of tests), and none seem to rely on {{<T>}} at all. So I'd rather we
remove that generic from the API signature.
* There is what seems to be a totally redundant {{SimpleOrderedMap}} class,
which has contradicting documentation in its ctor:
** _a {{NamedList}} where access by key is more important than maintaining
order_
** _This class does not provide efficient lookup by key_
But the class doesn't add any additional data structures, contains only 3 ctors
which delegate as-is to NamedList and offers a clone() which is identical to
NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think
this class is just confusing and has to go away.
Leaving performance aside for a second, NamedList could simply hold an internal
{{Map<String,List<Object>>}} to enable efficient access by key, remove all
values of a key, access a key's values in order etc. It doesn't allow accessing
the {{<name,value>}} pairs in any order though, i.e. {{getVal(i)}}. I don't
know how important is this functionality though .. i.e. if we replaced it with
a {{namesIterator()}}, would it not allow roughly the same thing? I'm kind of
sure it does, but there are so many uses of NamedList across the Solr code base
that I might be missing a case which won't like it. So I'd like to ask the Solr
folks who know this code better than me -- how important is {{getName/Val(i)}}?
Now back to performance, for a sec, in order to not always allocate a
{{List<Object>}} when NamedList is used to hold only one value per parameter,
we can either:
* Use Collections.singletonList() on first _add_, and change to a concrete List
on the second _add_ only.
* Use an {{Object[]}}, it's less expensive than a List object.
* Use a Map<String,Object> internally and do instanceof checks on add/get as
appropriate.
BUT, if accessing/removing values by name is not important and it's OK if
get(i) is run on O(N), we can simply simplify the class, like Yonik's proposal
above, to hold an Object[] array (instead of List). But I think we should
remove the generic anyway.
Maybe we should break this down into 3 issues:
* Get rid of SimpleOrderedMap -- if it's important to keep in 4x, I can
deprecate and move all uses of it to NamedList directly.
* Remove the generics from NamedList's API. We can add sugar getters for
specific types if we want.
* Simplify NamedList internal implementation. On the performance side -- how
critical is NamedList on the execution path? I don't like micro-benchmarks too
much, so if NamedList is only a fraction of an entire execution path, I'd
rather it's even a tad slower but readable and easier to use/maintain, than if
it's overly complicated only to buy us a few nanos in the overall request.
> org.apache.solr.common.util.NamedList - Typesafe efficient variant -
> ModernNamedList introduced - implementing the same API as NamedList
> ----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-912
> URL: https://issues.apache.org/jira/browse/SOLR-912
> Project: Solr
> Issue Type: Improvement
> Components: search
> Environment: Tomcat 6, JRE 6, Solr 1.3+ nightlies
> Reporter: Karthik K
> Priority: Minor
> Attachments: NLProfile.java, SOLR-912.patch, SOLR-912.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> The implementation of NamedList - while being fast - is not necessarily
> type-safe. I have implemented an additional implementation of the same -
> ModernNamedList (a type-safe variation providing the same interface as
> NamedList) - while preserving the semantics in terms of ordering of elements
> and allowing null elements for key and values (keys are always Strings ,
> while values correspond to generics ).
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]