[
https://issues.apache.org/jira/browse/SOLR-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144161#comment-14144161
]
Hoss Man commented on SOLR-6351:
--------------------------------
I'm headed out of town for a week, but before i go -- while it's fresh in my
head -- i wanted to post some notes on what the next steps need to be for this
based on the current state of trunk (after the refactoring & cleanup done in
recent issues like SOLR-6354 & SOLR-6507)
{panel:title=next steps}
*Single Node Pivot Tests...*
# aparently we've managed to go this far w/o any simple single-node pivot tests
other then {{SolrExampleTests}} -- which requires solrj support. Since it
would be nice to start with some simple proof that single node pivots+stats
work, we need to start iwth some tests
# we should add simple {{FacetPivotSmallTest}} that uses the same basic data
and assertions as {{DistributedFacetPivotSmallTest}} but with a single solr
node and using xpath (instead of depending on solrj).
*Local Pivots + stats...*
# add some logic & a getter to {{StatsField}} to make public a list of the
"tags" in it's local params
# add a {{Map<String,List<StatsField>>}} to {{StatsInfo}} to support a new
method for looking up the {{List<StatsField>>}} corrisponding to a given tag
string.
# Modify the Pivot facet code to check for a "stats" local param:
#* the value of which may be a comma seperated list of "tags" to lookup with
the {{StatsInfo}} instance of the current {{ResponseBuilder}} to get the
{{StatsField}} instances we want to hang of of our pivots.
#* if there are some {{StatsFields}} to hang off of our pivot, then any code in
{{PivotFacetProcessor}} which currently calls {{getSubsetSize()}} should call
{{getSubset()}}; and after *any* call (existing or new) to {{getSubset()}} the
code should (in addition to adding the set size to the response) pass that
DocSet to the {{StatsField.computeLocalStatsValues}} and include the resulting
StatsValues in the response.
# update the previously created {{FacetPivotSmallTest}} to also test hanging
some stats off of pivots
*SolrJ*
# update the SolrJ {{PivotField}} to support having a {{List<FieldStatsInfo>}}
in it
# update the solrj codecs to know how to populate those if/when the data exists
in the response
# add some unit tests for this in solrj (no existing unit tests of the pivot or
stats object creation from responses???)
# update {{SolrExampleTests}} to do some pivots+stats and verify that they can
be parsed correctly by solrj
*Distributed Pivot + Stats*
# {{PivotFacetValue}} needs to know if/when it hsould have one or more
{{StatsValues}} in it and get an empty instance from {{StatsValuesFactory}} for
each of the applicable {{StatsField}} instances.
# {{PivotFacetValue.createFromNamedLists}} needs to recognize when a shard is
including a a sub-NamedList of stats data, and for merge in each of those
children into the appropriate {{StatsValues.accumulate(NamedList)}} (based on
{{StatsField.getKey()}})
# at this point we should be able to update {{DistributedFacetPivotSmallTest}}
to include the same types of pivot+stats additions that were made to
{{FacetPivotSmallTest}} for checking the sngle node case, and see distributed
pivot+stats working.
*Test, Test, Test*
# at this point we should be able to update the other distribute pivot tests
with pivot + stats cases to make sure we don't find new bugs
# adding in stats params & assertions to {{DistributedFacetPivotLargeTest}} and
{{DistributedFacetPivotLongTailTest}} should be straight forward
# {{TestCloudPivotFacet}} will be more interesting due to the randomization...
#* adding new randomized {{stats.field}} params is trival given all the
interesting fields already included in the docs
#* with a little record keeping of what {{stats.field}} params we add, we can
easily tweak the {{facet.pivot}} params to includes a {{stats=...} local param
to ask for them
#* we'll want a trace param to to know if/when to expect stats in the response
(so we don't overlook bugs where stats are never computed/returned)
#* in {{assertPivotCountsAreCorrect}}, if stats are expected, then instead of a
simple {{assertNumFound}} on each of the pivot values, we can actaully assert
that when filtering on that pivot value, the _stats_ we get back for the entire
(filtered) request match the stats we got back as part of the pivot (ie: don't
just check the {{pivot\[count\]==numFound}}, also check
{{pivot\[stats\]\[foo\]\[min\]==stats\[foo\]\[min\]}} and
{{pivot\[stats\]\[foo\]\[max\]==stats\[foo\]\[max\]}}, etc...
#** the merge logic should be exact for the min/max/missing/count stats, but we
may need some leniency here for the comparisons of some of the computed stats
values like sum/mean/stddev/etc... since the order of operations involved in
the merge may cause intermediate precision loss
{panel}
One final note...
bq. we need to think carefully about "exclusions" because they might be
different
My current thinking (reflected in the steps i've outlined above) is that we
should go this route...
bq. i think what we want in general is for the "ex" localparam of the
stats.field to be ignored when hanging off of a facet.pivot
Of the 2 alternatives i proposed before:
* "union the exclusions" -- extremeley impractical.
* "fail if they both specify 'ex' and they aren't identical" -- very
possible/easy to do if people think it's less confusing, it just requires a bit
more code. we can easily go this route if we run into problems and decide it
makes the API cleaner.
> Let Stats Hang off of Pivots (via 'tag')
> ----------------------------------------
>
> Key: SOLR-6351
> URL: https://issues.apache.org/jira/browse/SOLR-6351
> Project: Solr
> Issue Type: Sub-task
> Reporter: Hoss Man
>
> he goal here is basically flip the notion of "stats.facet" on it's head, so
> that instead of asking the stats component to also do some faceting
> (something that's never worked well with the variety of field types and has
> never worked in distributed mode) we instead ask the PivotFacet code to
> compute some stats X for each leaf in a pivot. We'll do this with the
> existing {{stats.field}} params, but we'll leverage the {{tag}} local param
> of the {{stats.field}} instances to be able to associate which stats we want
> hanging off of which {{facet.pivot}}
> Example...
> {noformat}
> facet.pivot={!stats=s1}category,manufacturer
> stats.field={!key=avg_price tag=s1 mean=true}price
> stats.field={!tag=s1 min=true max=true}user_rating
> {noformat}
> ...with the request above, in addition to computing the min/max user_rating
> and mean price (labeled "avg_price") over the entire result set, the
> PivotFacet component will also include those stats for every node of the tree
> it builds up when generating a pivot of the fields "category,manufacturer"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]