[
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996154#comment-12996154
]
Hoss Man commented on SOLR-2366:
--------------------------------
(FYI: i haven't looked at the patch, because i'm trying to focus on 3.1 bug
fixes, but grant specifically called me out on this on irc, so i'm replying
based purely on the comments)
bq. I think it's a lot less confusing. You only have to express start, end and
the size of the buckets you want. With facet.query, you have to write out each
expression for every bucket and do the math on all the boundaries.
ok ... fair enough, i can't deny the syntax you are proposing would be _easier_
then specifying individual facet.query params, i'm just not convinced it would
be completely intuitive. If i told someone about this feature, and then showed
them this request...
{code}facet.range.start=10&facet.range.end=100&facet.range.gap=10,20,50{code}
I would be hard pressed to explain why the resulting ranges were...
{code}10-20, 20-40, 40-90, and 90-190{code}
...instead of...
{code}10-20, 10-30, 10-60, and 60-110{code}
(bearing in mind: facet.range.hardend defaults to "false")
the existing start/end/gap params may not be 100% intuitive purely by name, but
once you read about them once, they are fairly easy to grasp and not very
confusing at all when you read examples later. likewise, a collection of
facet.query objects is fairly intuitive and unambiguious. I just don't feel
that way about what you are suggesting (then agian: i unleashed "mm" on the
world, so i'm not really in a good position to throw stones)
I'm also not convinced that it really makes sense in use cases like this (where
you want variable sized buckets) to specify the *gap sizes* as a list, instead
of the specifying the *boundaries* on each bucket.
What you are describing almost feels like it should be a new category of
faceting -- or a variation on range faceting that doesn't involve the
start/end/gap params at all (but could still respects facet.range.include and
facet.range.other)
Here's my counter-proposal/suggestion...
I'm imagining a facet.range.buckets param that (if present) would override
facet.range.gap, facet.range.start, and facet.range.end (so using facet.range
would require *either* bucket or start/end/gap). facet.range.buckets would
take a comma separated list of value representing the specific values you
wanted to see used to define adjoining range boundary points, with some syntax
("..." seems natural) indicating "repeat last range size until reach this next
value"
so you could say...
{code}facet.range=price&facet.range.buckets=0,10,25,50,100,...,300{code}
...and the resulting ranges computed would be...
{code}0-10, 10-25, 25-50, 50-100, 100-150, 150-200, 200-250, 250-300{code}
...likewise you could say...
{code}facet.range=age&facet.range.buckets=0,1,...,18,25,40,60,...,100{code}
...and you would get ranges for each year from 0 to 18, followed by 18-25,
25-40, 40-60, 60-80, 80-100.
The tricky situations would be things like...
# {code}facet.range.buckets=0,2,3,...,10{code}
# {code}facet.range.buckets=0,7,...,10,20{code}
...the first _could_ be dealt with using facet.range.hardend like we do today
(so the resulting buckets were "0-2,2-5,5-8,8-11") but i don't think it should.
I think it should result in "0-2,2-5,5-8,8-10" ... it's hard to imaging
letting a param like facet.range.hardend override the explicit "10" in the
buckets list when we don't have programaticly generate buckets of precisesly
the same size, particularly when you consider the implications that would carry
over to the second case (i *really* can't imagine letting that produce any
ranges other then "0-7,7-10,10-20")
So yeah ... that's what i think would make more sense then letting you specify
a comma seperated list in the "gaps" param ... fundamentally i think it comes
down to the point i alluded to earlier in this comment: is specifying a
sequence of varying gap sizes more intuitive for this type of use case then
specifying a sequence of boundary points? i don't think it is.
(PS: i think the discussion about dynamically generating range points based on
stats in the index should really be tracked in a distinct issue ... it's got a
lot of complexity to it that we've talked about on the mailing list a few times
that i don't really want to try and get into now)
> Facet Range Gaps
> ----------------
>
> Key: SOLR-2366
> URL: https://issues.apache.org/jira/browse/SOLR-2366
> Project: Solr
> Issue Type: Improvement
> Reporter: Grant Ingersoll
> Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2366.patch, SOLR-2366.patch
>
>
> There really is no reason why the range gap for date and numeric faceting
> needs to be evenly spaced. For instance, if and when SOLR-1581 is completed
> and one were doing spatial distance calculations, one could facet by function
> into 3 different sized buckets: walking distance (0-5KM), driving distance
> (5KM-150KM) and everything else (150KM+), for instance. We should be able to
> quantize the results into arbitrarily sized buckets. I'd propose the syntax
> to be a comma separated list of sizes for each bucket. If only one value is
> specified, then it behaves as it currently does. Otherwise, it creates the
> different size buckets. If the number of buckets doesn't evenly divide up
> the space, then the size of the last bucket specified is used to fill out the
> remaining space (not sure on this)
> For instance,
> facet.range.start=0
> facet.range.end=400
> facet.range.gap=5,25,50,100
> would yield buckets of:
> 0-5,5-30,30-80,80-180,180-280,280-380,380-400
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]