[
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124591#comment-13124591
]
Hoss Man commented on SOLR-2366:
--------------------------------
Jan: I've got to be completely honest here -- catching up on this issue, I got
really confused and lost by some of your comments and the updated docs.
This sequence of comments really stands out at me...
{quote}
I have no good answer to this, other than inventing some syntax.
...
I think the values facet.range.include=upper/lower is clear. Outer/edge would
need some more work/definition.
...
*My primary reason for suggesting this is to give users a terse, intuitive
syntax for ranges.*
...
One thing this improvement needs to tackle is how to return the range buckets
in the Response. It will not be enough with the simple range_facet format ...
We need something which can return the explicit ranges,
{quote}
(emphasis added by me)
I really liked the simplicity of your earlier proposal, and I agree that it
would be really powerful/helpful to give users a terse, intuitive syntax for
specifying sequential ranges of variable sizes -- but it seems like we're
really moving away from the syntax being "intuitive" because of the hoops
you're having to jump through to treat this as an extension of the existing
"facet.range" param in your design.
I think we really ought to revisit my earlier suggestion to approach this as an
entirely new "type" of faceting - not a new plugin or a contrib, but a new
first-class type of faceting that FacetComponent would support, right along
side facet.field, facet.query, and facet.range. Let's ignore everything about
the existing facet.range.* param syntax, and the facet_range response format,
and think about what makes the most sense for this feature on it's own. If
there are ideas from facet.range that make sense to carry over (like
facet.range.include) then great -- but let's approach it from the "something
new that can borrow from facet.range" standpoint instead of the "extension to
facet.range that has a bunch of caveats with how facet.range already works"
I mean: if it looks like a duck, walks like a duck, and quacks like a duck,
then i'm happy to call it a duck -- but in this case:
* doesn't make sense with facet.range.other
* needs special start/end syntax to play nice with facet.range.start/end
* needs to change the response format
...ie: it doesn't look the same, it doesn't walk the same, and it doesn't quack.
---
Regardless of whether this functionality becomes part of facet.range or not, I
wanted to comment specifically on this idea...
bq. If all gaps are specified as explicit ranges this is no ambiguity, so we
could require all gaps to be explicit ranges if one wants to use it?
This seems like a really harsh limitation to impose. If the only way to use an
explicit range is in use cases where you *only* use explicit ranges, then what
value add does this feature give you over just using multiple facet.query
params? (it might be marginally fewer characters, but multiple facet.query
params seem more intuitive and easier to read). I mean: I don't have a
solution to propose, it just seems like there's not much point in supporting
explicit ranges in that case.
---
Having not thought about this issue in almost a month, and revisiting it with
(fairly) fresh eyes, and thinking about all the use cases that have been
discussed, it seems like the main goals we should address are really:
* an intuitive syntax for specifying end points for ranges of varying sizes
* ability to specify range end points using either fixed values or increments
* ability to specify that ranges should be either use sequential end points,
or be overlapping relative some fixed min/max value
In other words: the only reason (that i know of) why overlapping ranges even
came up in this issue was use cases like...
{noformat}
Price: $0-10, $0-20, $0-50, $0-100
Date: NOW-1DAY TO NOW, NOW-1MONTH TO NOW, NOW-1YEAR TO NOW
{noformat}
...there doesn't seem to be a lot of motivations for using overlapping ranges
in the "middle" of a sequence, and these types of use cases where *all* the
ranges overlap seem just as important as use cases where the ranges don't
overlap at all...
{noformat}
Price: $0-10, $10-20, $20-50, $50-100
Date: NOW-1DAY TO NOW, NOW-1MONTH TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH
{noformat}
...so let's try to focus on a syntax that makes both easy, using both fixed and
relative values, w/o worrying about supporting arbitrary overlapping ranges
(since I can't think of a use case for it, and it could always be achieved
using facet.query)
So how about something like...
{noformat}
facet.sequence=<fieldname>
facet.sequence.spec=[<wild>,]?<val>,<relval>[,<relval>]*[,<wild>]?
facet.sequence.type=[before|after|between]
facet.sequence.include=(same as facet.range.include)
{noformat}
Where "relval" would either be a concrete value, or a relative value; the
effective sequence has to either increase or decrease consistently or it's an
error; and "facet.sequence.type" determines whether the ranges are overlapping
("before" and "after") or not ("between")
So if you had a spec like this...
{noformat}
facet.sequence.spec=0,10,+10,50,+50
{noformat}
Then depending on facet.sequence.type you could either get...
{noformat}
facet.sequence.type=after
Price: $0-10, $0-20, $0-50, $0-100
facet.sequence.type=between
Price: $0-10, $10-20, $20-50, $50-100
facet.sequence.type=before
Price: $0-100, $10-100, $20-100, $50-100
{noformat}
"*" could be used at the start or end to indicate that you wanted an unbounded
range, but it wouldn't be a factor in determining the "fixed point" used if
type was "after" or "before", ie...
{noformat}
f.price.facet.sequence.spec=*,0,10,+10,50,+50,*
f.created.facet.sequence.spec=NOW,-1DAY,-1MONTH,-1YEAR
facet.sequence.type=after
Price: below $0, $0-10, $0-20, $0-50, $0-100, $100 and up
Created: NOW-1YEAR TO NOW, NOW-1YEAR TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH
facet.sequence.type=between
Price: below $0, $0-10, $10-20, $20-50, $50-100, $100 and up
Created: NOW-1DAY TO NOW, NOW-1MONTH TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH
facet.sequence.type=before
Price: below $0, $0-100, $10-100, $20-100, $50-100, $100 and up
Created: NOW-1DAY TO NOW, NOW-1MONTH TO NOW, NOW-1YEAR TO NOW
{noformat}
...if we defined things that way, i *think* that would simplify a lot of the
complexity we've been talking about, and simplify some of the use cases.
the only remaining issues that have been brought up (that i can think of) that
would still need to be work out would be:
1) what the response format needs to look like - I'd vote to punt on this until
we figure out the semantics.
2) when exactly ranges are inclusive/exclusive of their endpoints - i *think*
we should be able reuse the semantics from facet.range.include here, including
"edge", if we define ranges involving "*" as "outer" ranges, but we'd need to
work through more scenarios to be sure.
3) what happens if an increment overlaps with an absolute value, ie: my
original example of "10,20,+50,+100,120,150". The three possible solutions I
can think of are:
* fail loudly
* implement "precedence" rules, ie: that absolute values trump relative values
(10-20,20-70,70-120,120-150) or vice-versa (10-20,20-70,70-170)
* implement precedence rules but let them be controlled via a request param
(similar to how "facet.range.hardend" works)
---
What do you think? Are there any key use cases / features we've talked about
that you think this approach overlooks? Do you still think it should really be
an extension to "facet.range" ?
> Facet Range Gaps
> ----------------
>
> Key: SOLR-2366
> URL: https://issues.apache.org/jira/browse/SOLR-2366
> Project: Solr
> Issue Type: Improvement
> Reporter: Grant Ingersoll
> Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-2366.patch, SOLR-2366.patch
>
>
> There really is no reason why the range gap for date and numeric faceting
> needs to be evenly spaced. For instance, if and when SOLR-1581 is completed
> and one were doing spatial distance calculations, one could facet by function
> into 3 different sized buckets: walking distance (0-5KM), driving distance
> (5KM-150KM) and everything else (150KM+), for instance. We should be able to
> quantize the results into arbitrarily sized buckets.
> (Original syntax proposal removed, see discussion for concrete syntax)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]