Robert Muir created SOLR-4499:
---------------------------------
Summary: StatsComponent could use some serious TLC
Key: SOLR-4499
URL: https://issues.apache.org/jira/browse/SOLR-4499
Project: Solr
Issue Type: Bug
Reporter: Robert Muir
Most of these problems are actually documented on the wiki page, but here is my
go at ideas for improving it, after reviewing this thing today.
# The external API should be made performant (e.g. some sort of paging for the
stats.facet, vs returning ALL values)
# The code for multi-valued fields is clearly broken: it tries to use a
combination of UninvertedField with a single-valued fieldcache for multivalued
fields.
# The behavior for multi-valued fields could be unexpected: whether its
UninvertedField or DocValues, these datastructures return the *unique* set of
ordinals for the document. So I think it can be very misleading to return stats
like 'sum' for multivalued fields.
# The stats returned should be implemented in ways that are fast. For example
the string case returns min/max, but does this by looking up every single
ordinal to term and using string.compareTo. the ords are themselves comparable,
satisfying count/missing/min/max can all be done with 2 ord->term lookups per
segment. These are also the only stats I think multi-valued numerics should
return (see above).
# Things like accumulate(NamedList) appear to have scary runtime (I think this
one is only used for merging distributed results?). They should not use the
O\(n) get() method over and over in accumulate() but instead do a single pass
through the list.
Finally the code is pretty difficult to follow, and tests are inadequate for
what all is going on here.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]