[jira] [Commented] (SOLR-6351) Let Stats Hang off of Pivots (via 'tag')

Hoss Man (JIRA) Mon, 22 Sep 2014 18:12:43 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144161#comment-14144161
 ]


Hoss Man commented on SOLR-6351:
--------------------------------


I'm headed out of town for a week, but before i go -- while it's fresh in my 
head -- i wanted to post some notes on what the next steps need to be for this 
based on the current state of trunk (after the refactoring & cleanup done in 
recent issues like SOLR-6354 & SOLR-6507)

{panel:title=next steps}
*Single Node Pivot Tests...*

# aparently we've managed to go this far w/o any simple single-node pivot tests 
other then {{SolrExampleTests}} -- which requires solrj support.  Since it 
would be nice to start with some simple proof that single node pivots+stats 
work, we need to start iwth some tests
# we should add simple  {{FacetPivotSmallTest}} that uses the same basic data 
and assertions as {{DistributedFacetPivotSmallTest}} but with a single solr 
node and using xpath (instead of depending on solrj).

*Local Pivots + stats...*

# add some logic & a getter to {{StatsField}} to make public a list of the 
"tags" in it's local params
# add a {{Map<String,List<StatsField>>}} to {{StatsInfo}} to support a new 
method for looking up the {{List<StatsField>>}} corrisponding to a given tag 
string.
# Modify the Pivot facet code to check for a "stats" local param:
#* the value of which may be a comma seperated list of "tags" to lookup with 
the {{StatsInfo}} instance of the current {{ResponseBuilder}} to get the 
{{StatsField}} instances we want to hang of of our pivots.
#* if there are some {{StatsFields}} to hang off of our pivot, then any code in 
{{PivotFacetProcessor}} which currently calls {{getSubsetSize()}} should call 
{{getSubset()}}; and after *any* call (existing or new) to {{getSubset()}} the 
code should (in addition to adding the set size to the response) pass that 
DocSet to the {{StatsField.computeLocalStatsValues}} and include the resulting 
StatsValues in the response.
# update the previously created {{FacetPivotSmallTest}} to also test hanging 
some stats off of pivots

*SolrJ*

# update the SolrJ {{PivotField}} to support having a {{List<FieldStatsInfo>}} 
in it
# update the solrj codecs to know how to populate those if/when the data exists 
in the response
# add some unit tests for this in solrj (no existing unit tests of the pivot or 
stats object creation from responses???)
# update {{SolrExampleTests}} to do some pivots+stats and verify that they can 
be parsed correctly by solrj

*Distributed Pivot + Stats*

# {{PivotFacetValue}} needs to know if/when it hsould have one or more 
{{StatsValues}} in it and get an empty instance from {{StatsValuesFactory}} for 
each of the applicable {{StatsField}} instances.
# {{PivotFacetValue.createFromNamedLists}} needs to recognize when a shard is 
including a a sub-NamedList of stats data, and for merge in each of those 
children into the appropriate {{StatsValues.accumulate(NamedList)}} (based on 
{{StatsField.getKey()}})
# at this point we should be able to update {{DistributedFacetPivotSmallTest}} 
to include the same types of pivot+stats additions that were made to 
{{FacetPivotSmallTest}} for checking the sngle node case, and see distributed 
pivot+stats working.

*Test, Test, Test*

# at this point we should be able to update the other distribute pivot tests 
with pivot + stats cases to make sure we don't find new bugs
# adding in stats params & assertions to {{DistributedFacetPivotLargeTest}} and 
{{DistributedFacetPivotLongTailTest}} should be straight forward
# {{TestCloudPivotFacet}} will be more interesting due to the randomization...
#* adding new randomized {{stats.field}} params is trival given all the 
interesting fields already included in the docs
#* with a little record keeping of what {{stats.field}} params we add, we can 
easily tweak the {{facet.pivot}} params to includes a {{stats=...} local param 
to ask for them
#* we'll want a trace param to to know if/when to expect stats in the response 
(so we don't overlook bugs where stats are never computed/returned)
#* in {{assertPivotCountsAreCorrect}}, if stats are expected, then instead of a 
simple {{assertNumFound}} on each of the pivot values, we can actaully assert 
that when filtering on that pivot value, the _stats_ we get back for the entire 
(filtered) request match the stats we got back as part of the pivot (ie: don't 
just check the {{pivot\[count\]==numFound}}, also check 
{{pivot\[stats\]\[foo\]\[min\]==stats\[foo\]\[min\]}} and 
{{pivot\[stats\]\[foo\]\[max\]==stats\[foo\]\[max\]}}, etc...
#** the merge logic should be exact for the min/max/missing/count stats, but we 
may need some leniency here for the comparisons of some of the computed stats 
values like sum/mean/stddev/etc... since the order of operations involved in 
the merge may cause intermediate precision loss
{panel}

One final note...

bq. we need to think carefully about "exclusions" because they might be 
different 

My current thinking (reflected in the steps i've outlined above) is that we 
should go this route...

bq. i think what we want in general is for the "ex" localparam of the 
stats.field to be ignored when hanging off of a facet.pivot

Of the 2 alternatives i proposed before: 

* "union the exclusions" -- extremeley impractical.
* "fail if they both specify 'ex' and they aren't identical" -- very 
possible/easy to do if people think it's less confusing, it just requires a bit 
more code.  we can easily go this route if we run into problems and decide it 
makes the API cleaner.



> Let Stats Hang off of Pivots (via 'tag')
> ----------------------------------------
>
>                 Key: SOLR-6351
>                 URL: https://issues.apache.org/jira/browse/SOLR-6351
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Hoss Man
>
> he goal here is basically flip the notion of "stats.facet" on it's head, so 
> that instead of asking the stats component to also do some faceting 
> (something that's never worked well with the variety of field types and has 
> never worked in distributed mode) we instead ask the PivotFacet code to 
> compute some stats X for each leaf in a pivot.  We'll do this with the 
> existing {{stats.field}} params, but we'll leverage the {{tag}} local param 
> of the {{stats.field}} instances to be able to associate which stats we want 
> hanging off of which {{facet.pivot}}
> Example...
> {noformat}
> facet.pivot={!stats=s1}category,manufacturer
> stats.field={!key=avg_price tag=s1 mean=true}price
> stats.field={!tag=s1 min=true max=true}user_rating
> {noformat}
> ...with the request above, in addition to computing the min/max user_rating 
> and mean price (labeled "avg_price") over the entire result set, the 
> PivotFacet component will also include those stats for every node of the tree 
> it builds up when generating a pivot of the fields "category,manufacturer"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6351) Let Stats Hang off of Pivots (via 'tag')

Reply via email to