[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996154#comment-12996154
 ] 

Hoss Man commented on SOLR-2366:
--------------------------------

(FYI: i haven't looked at the patch, because i'm trying to focus on 3.1 bug 
fixes, but grant specifically called me out on this on irc, so i'm replying 
based purely on the comments)

bq. I think it's a lot less confusing. You only have to express start, end and 
the size of the buckets you want. With facet.query, you have to write out each 
expression for every bucket and do the math on all the boundaries.

ok ... fair enough, i can't deny the syntax you are proposing would be _easier_ 
then specifying individual facet.query params, i'm just not convinced it would 
be completely intuitive.  If i told someone about this feature, and then showed 
them this request...

{code}facet.range.start=10&facet.range.end=100&facet.range.gap=10,20,50{code}

I would be hard pressed to explain why the resulting ranges were...

{code}10-20, 20-40, 40-90, and 90-190{code}

...instead of...

{code}10-20, 10-30, 10-60, and 60-110{code}

(bearing in mind: facet.range.hardend defaults to "false")

the existing start/end/gap params may not be 100% intuitive purely by name, but 
once you read about them once, they are fairly easy to grasp and not very 
confusing at all when you read examples later.  likewise, a collection of 
facet.query objects is fairly intuitive and unambiguious.  I just don't feel 
that way about what you are suggesting (then agian: i unleashed "mm" on the 
world, so i'm not really in a good position to throw stones)

I'm also not convinced that it really makes sense in use cases like this (where 
you want variable sized buckets) to specify the *gap sizes* as a list, instead 
of the specifying the *boundaries* on each bucket.

What you are describing almost feels like it should be a new category of 
faceting -- or a variation on range faceting that doesn't involve the 
start/end/gap params at all (but could still respects facet.range.include and 
facet.range.other)

Here's my counter-proposal/suggestion...

I'm imagining a facet.range.buckets param that (if present) would override 
facet.range.gap, facet.range.start, and facet.range.end (so using facet.range  
would require *either* bucket or start/end/gap).  facet.range.buckets would 
take a comma separated list of value representing the specific values you 
wanted to see used to define adjoining range boundary points, with some syntax 
("..." seems natural) indicating "repeat last range size until reach this next 
value"

so you could say...

{code}facet.range=price&facet.range.buckets=0,10,25,50,100,...,300{code}

...and the resulting ranges computed would be...

{code}0-10, 10-25, 25-50, 50-100, 100-150, 150-200, 200-250, 250-300{code}

...likewise you could say...

{code}facet.range=age&facet.range.buckets=0,1,...,18,25,40,60,...,100{code}

...and you would get ranges for each year from 0 to 18, followed by 18-25, 
25-40, 40-60, 60-80, 80-100.

The tricky situations would be things like...

# {code}facet.range.buckets=0,2,3,...,10{code}
# {code}facet.range.buckets=0,7,...,10,20{code}

...the first _could_ be dealt with using facet.range.hardend like we do today 
(so the resulting buckets were "0-2,2-5,5-8,8-11") but i don't think it should. 
  I think it should result in "0-2,2-5,5-8,8-10" ... it's hard to imaging 
letting a param like facet.range.hardend override the explicit "10" in the 
buckets list when we don't have programaticly generate buckets of precisesly 
the same size, particularly when you consider the implications that would carry 
over to the second case (i *really* can't imagine letting that produce any 
ranges other then "0-7,7-10,10-20")

So yeah ... that's what i think would make more sense then letting you specify 
a comma seperated list in the "gaps" param ... fundamentally i think it comes 
down to the point i alluded to earlier in this comment: is specifying a 
sequence of varying gap sizes more intuitive for this type of use case then 
specifying a sequence of boundary points? i don't think it is.

(PS: i think the discussion about dynamically generating range points based on 
stats in the index should really be tracked in a distinct issue ... it's got a 
lot of complexity to it that we've talked about on the mailing list a few times 
that i don't really want to try and get into now)

> Facet Range Gaps
> ----------------
>
>                 Key: SOLR-2366
>                 URL: https://issues.apache.org/jira/browse/SOLR-2366
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: SOLR-2366.patch, SOLR-2366.patch
>
>
> There really is no reason why the range gap for date and numeric faceting 
> needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
> and one were doing spatial distance calculations, one could facet by function 
> into 3 different sized buckets: walking distance (0-5KM), driving distance 
> (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
> quantize the results into arbitrarily sized buckets.  I'd propose the syntax 
> to be a comma separated list of sizes for each bucket.  If only one value is 
> specified, then it behaves as it currently does.  Otherwise, it creates the 
> different size buckets.  If the number of buckets doesn't evenly divide up 
> the space, then the size of the last bucket specified is used to fill out the 
> remaining space (not sure on this)
> For instance,
> facet.range.start=0
> facet.range.end=400
> facet.range.gap=5,25,50,100
> would yield buckets of:
> 0-5,5-30,30-80,80-180,180-280,280-380,380-400

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to