[ https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shai Erera updated LUCENE-7590: ------------------------------- Attachment: LUCENE-7590.patch Patch implements a {{DocValuesStatsCollector}}. Note some key design decisions: A {{DocValuesStats}} is responsible for providing the specific {{DocValuesIterator}} for a {{LeafReaderContext}}. It then accumulates the value, computes missing and other statistics. It computes {{missing}} and {{count}}, leaving {{min}} and {{max}} to the actual implementation. Also, this stats does not define a {{mean}}, as at least for now I'm not sure how the mean value of a {{SortedSetDocValues}} is defined. An abstract {{NumericDocValuesStats}} implementation for single-numeric DV fields, which also adds a {{mean}} statistic, with two concrete implementations: {{LongNumericDocValuesStats}} and {{DoubleNumericDocValuesStats}}. This hierarchy should allow us to add further statistics for {{SortedSet}} and {{SortedNumeric}} DV fields. I did not implement them yet, as I'm not sure about some of the statistics (e.g. should the {{mean}} stat of a {{SortedNumeric}} be the mean across all values, or the minimum per document or ...). Let's discuss that separately. Also, note that I had to make {{DocValuesIterator}} public in order to declare it in this collector. If you're OK with the design and implementation, I want to separate {{DovValuesStats}} to its own file, for clarity. I did not do it yet though. > Add DocValues statistics helpers > -------------------------------- > > Key: LUCENE-7590 > URL: https://issues.apache.org/jira/browse/LUCENE-7590 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/misc > Reporter: Shai Erera > Assignee: Shai Erera > Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, > LUCENE-7590.patch > > > I think it can be useful to have DocValues statistics helpers, that can allow > users to query for the min/max/avg etc. stats of a DV field. In this issue > I'd like to cover numeric DV, but there's no reason not to add it to other DV > types too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org