[ 
https://issues.apache.org/jira/browse/SOLR-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288089#comment-16288089
 ] 

ASF subversion and git services commented on SOLR-11695:
--------------------------------------------------------

Commit 2990c88a927213177483b61fe8e6971df04fc3ed in lucene-solr's branch 
refs/heads/master from Chris Hostetter
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2990c88 ]

Beef up testing of json.facet 'refine:simple' when dealing with 'Long Tail' 
terms

In an attempt to get more familiar with json.facet refinement, I set out to try 
and refactor/generalize/clone
some of the existing facet.pivot refinement tests to assert that json.facet 
could produce the same results.
This test is a baby step towards doing that: Cloning 
DistributedFacetPivotLongTailTest into
DistributedFacetSimpleRefinementLongTailTest (with shared index building code).

Along the way, I learned that the core logic of 'refine:simple' is actually 
quite different then how facet.field
& facet.pivot work (see discussion in SOLR-11733), so they do *NOT* produce the 
same results in many "Long Tail"
Sitautions.  As a result, many of the logic/assertions 
inDistributedFacetSimpleRefinementLongTailTest are very
differnet then their counter parts in DistributedFacetPivotLongTailTest, with 
detailed explanations in comments.

Hopefully this test will prove useful down the road to anyone who might want to 
compare/contrast facet.pivot
with json.facet, and to prevent regressions in 'refine:simple' if/when we add 
more complex refinement
approaches in the future.

There are also a few TODOs in the test related to some other small 
discrepencies between json.facet and
stats.field that I opened along the way, indicating where the tests should be 
modified once those issues are
addressed in json.facet...

 - SOLR-11706: support for multivalued numeric fields in stats
 - SOLR-11695: support for 'missing()' & 'num_vals()' (aka: 'count' from 
stats.field) numeric stats
 - SOLR-11725: switch from 'uncorrected stddev' to 'corrected stddev'


> JSON FacetModule needs equivilents for StatsComponent's "count" and "missing" 
> features
> --------------------------------------------------------------------------------------
>
>                 Key: SOLR-11695
>                 URL: https://issues.apache.org/jira/browse/SOLR-11695
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>
> StatsComponent supports stats named "count" and "missing":
> * count: for the set of documents we're computing stats over, "how many 
> _non-distinct_ values exist in those documents in the specified field?" (or 
> in the case of an arbitrary function: "in how many of these documents does 
> true==ValueSource.exist()" ?)
> ** no to be confused with the number of _unique_ values (aprox "cardinality" 
> or exact "countDistinct")
> * missing: for the set of documents we're computing stats over, "how many of 
> those documents do not have any value in the specified field?" (or in the 
> case of an arbitrary function: "in how many of thse documents does 
> false==ValueSource.exist()" ?)
> (NOTE: for a single valued field, these are essentially inveses of each 
> other, but for multivalued fields "count" actaully returns the total number 
> of "value instances" not just the number of docs that have at least one value)
> AFAICT there is no equivalent functionality supported by the JSON 
> FacetModule, which will be a blocker preventing some users from migrating 
> from using stats.field (or facet.pivot+stats.field) to json.facet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to