[ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-7590:
-------------------------------
    Attachment: LUCENE-7590.patch

Patch implements a {{DocValuesStatsCollector}}. Note some key design decisions:

A {{DocValuesStats}} is responsible for providing the specific 
{{DocValuesIterator}} for a {{LeafReaderContext}}. It then accumulates the 
value, computes missing and other statistics. It computes {{missing}} and 
{{count}}, leaving {{min}} and {{max}} to the actual implementation. Also, this 
stats does not define a {{mean}}, as at least for now I'm not sure how the mean 
value of a {{SortedSetDocValues}} is defined.

An abstract {{NumericDocValuesStats}} implementation for single-numeric DV 
fields, which also adds a {{mean}} statistic, with two concrete 
implementations: {{LongNumericDocValuesStats}} and 
{{DoubleNumericDocValuesStats}}.

This hierarchy should allow us to add further statistics for {{SortedSet}} and 
{{SortedNumeric}} DV fields. I did not implement them yet, as I'm not sure 
about some of the statistics (e.g. should the {{mean}} stat of a 
{{SortedNumeric}} be the mean across all values, or the minimum per document or 
...). Let's discuss that separately.

Also, note that I had to make {{DocValuesIterator}} public in order to declare 
it in this collector.

If you're OK with the design and implementation, I want to separate 
{{DovValuesStats}} to its own file, for clarity. I did not do it yet though.

> Add DocValues statistics helpers
> --------------------------------
>
>                 Key: LUCENE-7590
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7590
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/misc
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to