[
https://issues.apache.org/jira/browse/SOLR-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-6968:
---------------------------
Attachment: SOLR-6968.patch
Updated patch with more tests.
My current TODO list...
{noformat}
- 6 < regwidth makes no sense?
- even at (min) log2m==4, isn't regwidth==6 big enough for all possible
(hashed) long values?
- prehashed support
- need to sanity/error check that the field is a long
- add an update processor to make this easy to do at index time
- tunning knobs
- memory vs accuracy (log2m)
- idea: (least ram) 0 < accuracy < 1 (most accurate)
- scale
- max cardinality estimatable (regwidth)
- perhaps hardcode regwidth==6 ? expert only option to adjust?
- pick regwidth based on field type? (int/enum have fewer in general)
- pick regwidth based on index stats? max out based on total terms in
field?
- or for single valued fields: max out based on numDocs
- HLL must use same hash seed, but does it support union when log2m and
regwidth are diff?
- convinience equivilence with countDistinct in solrj response obj ?
{noformat}
> add hyperloglog in statscomponent as an approximate count
> ---------------------------------------------------------
>
> Key: SOLR-6968
> URL: https://issues.apache.org/jira/browse/SOLR-6968
> Project: Solr
> Issue Type: Sub-task
> Reporter: Hoss Man
> Attachments: SOLR-6968.patch, SOLR-6968.patch, SOLR-6968.patch
>
>
> stats component currently supports "calcDistinct" but it's terribly
> inefficient -- especially in distib mode.
> we should add support for using hyperloglog to compute an approximate count
> of distinct values (using localparams via SOLR-6349 to control the precision
> of the approximation)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]