[
https://issues.apache.org/jira/browse/SOLR-12839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659854#comment-16659854
]
Hoss Man commented on SOLR-12839:
---------------------------------
{quote}"foo desc, bar asc 50" was an example of a single sort with tiebreak and
a limit (no resort). If one wanted a single string version ";" would be the
divider. For example adding a resort with a tiebreak: "foo desc, bar asc 50;
baz desc, qux asc 10"
{quote}
Ok ... i realize now that you were discussing 2 diff ideas and giving 2 diff
examples and i was conflating them – but i'm still not certain what you're
saying the *behavior* of these examples would be, particulalry because
(independent of the idea of resorting *AND* independent of the idea of
supporting tiebreakers on sort/resort syntax) you *ALSO* seem to be suggesting
a numeric "limit" that would be inlined as part of the sort/resort syntax – and
this confuses me in 2 orthoginal ways:
* are you suggesting this would be an alternative for the existing {{limit}}
param on these facets?
** if so, what would be behavior if someone tired to do both? use the "inline
limit" and use a "limit" param?
** if not, then what do you mean by "limit" in the above sentence?
* assuming you did mean as a replacement/override of the existing {{limit}}
param, i don't understand your example and what the value add of asking solr to
resort the "top 10" by criteria "baz desc, qux asc" if we're already returning
the "top 50"
{quote}If there are use cases for starting with N sorted things and reducing
that to K with a different sort, then it's just sort of recursive. Why would
there be use cases for one resort and not two resorts?
...
One use case that comes to mind are stock screens I've seen that consist of
multiple sorting and "take top N" steps.
Example: Sort by current dividend yield and take the top 100, then sort those
by low PE and take the top 50, then sort those by total return 1 year and take
the top 10.
{quote}
...again: if this is a situation where solr is returning the top 100 buckets,
what's the value add in having solr resort the top 50 (and then the top 10
again) instead of just letting the client manipulate & re-order those same
buckets?
----
I feel like maybe there is a disconnect in the _principle_ of the ideas we are
discussing?
As I mentioned when i created this issue, the overall goal i'm trying to
address is to mirror the concept of the "reranking query" at a facet bucket
level ... for addressing the performance cost of sorting by something
complex/expensive.
* Today you can ask solr:
** Compute {{expensive_function()}} for every bucket that exists, and sort all
the buckets by that function – then return the top {{$limit}} buckets"
* I want to be able to tell solr:
** "Compute {{cheaper_aproximation_of_expensive_function()}} for every bucket
that exists, sort all the buckets by that function, and compute
{{expensive_function()}} only for the top candidate buckets – then (once
refinement/merging is complete) resort just the fully populated buckets by
{{expensive_function()}}
...note in particular that I'm not even suggesting any sort of new
{{resort_limit}} option or any hard and fast guarantees on the number of
buckets that are "resorted" – just a way to tell solr "during the first pass,
you can use this cheap function instead of the final expensive function i
really care about" ... in essence just a "performance hint" or "save some CPU
cycles" type feature
What you're describing on the other hand seems to be more akin to a "i want
specific operations to be performed on my buckets" type feature ... the
examples you're describing sound almost like a subset of a more robust
scripting type functionality, or at the very least a multi stage "post
processing" that might include filtering or collapsing of buckets?
...Lemme come back to this conceptual disconnect in a minute...
----
{quote}Anyway we don't have to worry about multiple resorts now as long as we
can unambiguously upgrade if desired later (i.e. whatever the resort spec looks
like, if we can unambiguously wrap an array around it later and specify
multiple of them, then we're good)
{quote}
right ... but if you're trying to future proof the API, there's also the
question of "tiebreakers" look like when using the (existing) JSON object
syntax for sorting instead of just the shorthand string syntax.
ie, if you completely ignore the concept of "resorting", today we support
this...
{noformat}
json.facet={
categories : {
type : terms,
field : cat,
limit : 5,
facet : { x : "sum(div(popularity,price))" },
// can use short hand of "x desc"
sort : { x : desc },
}
}
{noformat}
...and if you assume you want to have multiple tiebreaker sorts then that would
be something like...
{noformat}
json.facet={
categories : {
type : terms,
field : cat,
limit : 5,
facet : { x : "sum(div(popularity,price))" },
// can use short hand of "x desc, count desc"
sort : [{ x : desc },
{ count : desc }],
}
}
{noformat}
...so in a hypothetical future world were you can have multiple "resort"
options, each of which can be arrays of sort criteria JSON objects (which may
or may not have their own "intermediate limits") then we have to imagine what
we might want that to look like?
----
...revisiting the conceptual disconnect I mentioned above: Let's assume (since
you've clearly already thought about it) that somewhere down the road we
definitely do want more robust options for resorting/filtering/reducing
buckets, then maybe the best way to more forward _now_ with a short term
improvement for the sorting/resorting on an {{expensive_function()}} vs
{{cheaper_aproximation_of_expensive_function()}} performance hint/optimization
– in a way that wouldn't hinder us down the road with more full featured bucket
processing – would be to "invert" the API i proposed/implemented and renamed
the option...
* instead of adding a {{resort}} option, add an {{approximate_sort}} option
and flip the meaning from what i was originally thinking...
{noformat}
json.facet={
categories : {
type : terms,
field : cat,
limit : 5,
sort : "consumer_value desc",
approximate_sort : "count desc",
facet : { consumer_value : "sum(div(popularity,price))" }
}
}
{noformat}
* instead of using {{sort}} in phase#1, and resorting on the {{resort}} option
after merging, we would use {{approximate_sort}} in phase#1, and {{sort}} in
phase#2
* from an end user perspective {{sort}} is the most important thing:
** the buckets to be returned will be in {{sort}} order
** {{approximate_sort}} is an expert level option only used as an inexpensive
approximation when picking the buckets to be returned, where the exact number
of buckets considered in this way is essentially an implementation detail
* down the road, when we add "tiebreaker" support to "sort type options", it
should be fairly trivial to support it for both options (with some
generalization of the helper methods added in the patch)
* farther down the road, if we want a more robust "sort, resort, process,
filter the buckets returned" type option/syntax to replace the {{sort}} param –
then we can add that independent of the {{approximate_sort}} option, and
{{approximate_sort}} can still be used as a way to indicate an "inexpensive"
way to initially rank the buckets to consider before any of that new
functionality.
WDYT?
> add a 'resort' option to JSON faceting
> --------------------------------------
>
> Key: SOLR-12839
> URL: https://issues.apache.org/jira/browse/SOLR-12839
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Facet Module
> Reporter: Hoss Man
> Assignee: Hoss Man
> Priority: Major
> Attachments: SOLR-12839.patch, SOLR-12839.patch
>
>
> As discusssed in SOLR-9480 ...
> bq. Similar to how the {{rerank}} request param allows people to collect &
> score documents using a "cheap" query, and then re-score the top N using a
> ore expensive query, I think it would be handy if JSON Facets supported a
> {{resort}} option that could be used on any FacetRequestSorted instance right
> along side the {{sort}} param, using the same JSON syntax, so that clients
> could have Solr internaly sort all the facet buckets by something simple
> (like count) and then "Re-Sort" the top N=limit (or maybe (
> N=limit+overrequest ?) using a more expensive function like skg()
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]