[ 
https://issues.apache.org/jira/browse/SOLR-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083884#comment-14083884
 ] 

Shai Erera edited comment on SOLR-912 at 8/3/14 6:22 AM:
---------------------------------------------------------

Resurrecting that issue, I reviewed NamedList and I don't understand why it has 
to be so complicated:

* Its {{<T>}} generic results in TONS of warnings in eclipse, for no good 
reason. I don't buy this comment made on the issue that it's better to make it 
generic than not. If we have a generic API which isn't used like that, it calls 
for a bad API IMO. From what I can see, NamedList is not supposed to be generic 
at all, as its main purpose is to allow you to store a heterogeneous list of 
{{<name,value>}} pairs, where name is always String, and the type of {{value}} 
may differ. If we want to make it more convenient for people who know e.g. all 
values are Strings, we can add sugar methods like {{getInt(), getString()...}}. 
I've also briefly reviewed some classes that use NamedList (outside of tests), 
and none seem to rely on {{<T>}} at all. So I'd rather we remove that generic 
from the API signature.

* There is what seems to be a totally redundant {{SimpleOrderedMap}} class, 
which has contradicting documentation in its class-jdocs:
** _a {{NamedList}} where access by key is more important than maintaining 
order_
** _This class does not provide efficient lookup by key_

But the class doesn't add any additional data structures, contains only 3 ctors 
which delegate as-is to NamedList and offers a clone() which is identical to 
NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think 
this class is just confusing and has to go away.

Leaving performance aside for a second, NamedList could simply hold an internal 
{{Map<String,List<Object>>}} to enable efficient access by key, remove all 
values of a key, access a key's values in order etc. It doesn't allow accessing 
the {{<name,value>}} pairs in any order though, i.e. {{getVal\(i\)}}. I don't 
know how important is this functionality though .. i.e. if we replaced it with 
a {{namesIterator()}}, would it not allow roughly the same thing? I'm kind of 
sure it does, but there are so many uses of NamedList across the Solr code base 
that I might be missing a case which won't like it. So I'd like to ask the Solr 
folks who know this code better than me -- how important is 
{{getName/Val\(i\)}}?

Now back to performance, for a sec, in order to not always allocate a 
{{List<Object>}} when NamedList is used to hold only one value per parameter, 
we can either:

* Use Collections.singletonList() on first _add_, and change to a concrete List 
on the second _add_ only.
* Use an {{Object[]}}, it's less expensive than a List object.
* Use a Map<String,Object> internally and do instanceof checks on add/get as 
appropriate.

BUT, if accessing/removing values by name is not important and it's OK if 
get\(i\) is run on O(N), we can simply simplify the class, like Yonik's 
proposal above, to hold an Object[] array (instead of List). But I think we 
should remove the generic anyway.

Maybe we should break this down into 3 issues:

* Get rid of SimpleOrderedMap -- if it's important to keep in 4x, I can 
deprecate and move all uses of it to NamedList directly.
* Remove the generics from NamedList's API. We can add sugar getters for 
specific types if we want.
* Simplify NamedList internal implementation. On the performance side -- how 
critical is NamedList on the execution path? I don't like micro-benchmarks too 
much, so if NamedList is only a fraction of an entire execution path, I'd 
rather it's even a tad slower but readable and easier to use/maintain, than if 
it's overly complicated only to buy us a few nanos in the overall request.


was (Author: shaie):
Resurrecting that issue, I reviewed NamedList and I don't understand why it has 
to be so complicated:

* Its {{<T>}} generic results in TONS of warnings in eclipse, for no good 
reason. I don't buy this comment made on the issue that it's better to make it 
generic than not. If we have a generic API which isn't used like that, it calls 
for a bad API IMO. From what I can see, NamedList is not supposed to be generic 
at all, as its main purpose is to allow you to store a heterogeneous list of 
{{<name,value>}} pairs, where name is always String, and the type of {{value}} 
may differ. If we want to make it more convenient for people who know e.g. all 
values are Strings, we can add sugar methods like {{getInt(), getString()...}}. 
I've also briefly reviewed some classes that use NamedList (outside of tests), 
and none seem to rely on {{<T>}} at all. So I'd rather we remove that generic 
from the API signature.

* There is what seems to be a totally redundant {{SimpleOrderedMap}} class, 
which has contradicting documentation in its class-jdocs:
** _a {{NamedList}} where access by key is more important than maintaining 
order_
** _This class does not provide efficient lookup by key_

But the class doesn't add any additional data structures, contains only 3 ctors 
which delegate as-is to NamedList and offers a clone() which is identical to 
NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think 
this class is just confusing and has to go away.

Leaving performance aside for a second, NamedList could simply hold an internal 
{{Map<String,List<Object>>}} to enable efficient access by key, remove all 
values of a key, access a key's values in order etc. It doesn't allow accessing 
the {{<name,value>}} pairs in any order though, i.e. {{getVal(i)}}. I don't 
know how important is this functionality though .. i.e. if we replaced it with 
a {{namesIterator()}}, would it not allow roughly the same thing? I'm kind of 
sure it does, but there are so many uses of NamedList across the Solr code base 
that I might be missing a case which won't like it. So I'd like to ask the Solr 
folks who know this code better than me -- how important is {{getName/Val(i)}}?

Now back to performance, for a sec, in order to not always allocate a 
{{List<Object>}} when NamedList is used to hold only one value per parameter, 
we can either:

* Use Collections.singletonList() on first _add_, and change to a concrete List 
on the second _add_ only.
* Use an {{Object[]}}, it's less expensive than a List object.
* Use a Map<String,Object> internally and do instanceof checks on add/get as 
appropriate.

BUT, if accessing/removing values by name is not important and it's OK if 
get(i) is run on O(N), we can simply simplify the class, like Yonik's proposal 
above, to hold an Object[] array (instead of List). But I think we should 
remove the generic anyway.

Maybe we should break this down into 3 issues:

* Get rid of SimpleOrderedMap -- if it's important to keep in 4x, I can 
deprecate and move all uses of it to NamedList directly.
* Remove the generics from NamedList's API. We can add sugar getters for 
specific types if we want.
* Simplify NamedList internal implementation. On the performance side -- how 
critical is NamedList on the execution path? I don't like micro-benchmarks too 
much, so if NamedList is only a fraction of an entire execution path, I'd 
rather it's even a tad slower but readable and easier to use/maintain, than if 
it's overly complicated only to buy us a few nanos in the overall request.

> org.apache.solr.common.util.NamedList - Typesafe efficient variant - 
> ModernNamedList introduced - implementing the same API as NamedList
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-912
>                 URL: https://issues.apache.org/jira/browse/SOLR-912
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>         Environment: Tomcat 6, JRE 6, Solr 1.3+ nightlies 
>            Reporter: Karthik K
>            Priority: Minor
>         Attachments: NLProfile.java, SOLR-912.patch, SOLR-912.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> The implementation of NamedList - while being fast - is not necessarily 
> type-safe. I have implemented an additional implementation of the same - 
> ModernNamedList (a type-safe variation providing the same interface as 
> NamedList) - while preserving the semantics in terms of ordering of elements 
> and allowing null elements for key and values (keys are always Strings , 
> while values correspond to generics ). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to