[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099315#comment-13099315
 ] 

Hoss Man commented on SOLR-2366:
--------------------------------

Jan: i took a look at r3 of your VariableRangeGaps wiki, here are the things 
I'm concerned about because they seem a bit confusing/ambiguious....

1) we need to decide what the behavior should be when the spec identifies 
values out of order (ie: {{10, 50, 30}}) ... it might be tempting to say "allow 
them, and swap the values" (ie: "10-50, 30-50") but the merit of that approach 
doesn't seem worth the potential risk of silently hiding errors (ie: if the 
user made a typo and ment "10-50, 50-130") not to mention it could be really 
hard to understand what's going on in the case where some values are specified 
absolutely and some are specified as incriments (see bullet #3 in my "02/Apr/11 
23:43" comment above -- ie: what ranges would we produce for 
{{10,20,+50,+100,120,150}} ?).  

I would suggest define any case where the spec contains absolute value N after 
(effective) value M where N < M as an error and fail fast.  

Still not sure what (if anything) should be done about overlapping ranges that 
appear out of order (ie: {{0,100,50..90,150}} ... is that "0-100,50-90,90-150" 
?)

2) Independent of my opinion on the {{*}} syntax, I'm a little concerned by the 
descrepency in these examples...

{noformat}
facet.range.spec=*,10,50,100,250,* - gives 5 ranges: MIN-10, 10-50, 50-100, 
100-250, 250->MAX
facet.range.spec=*,10,+40,+50,250,* - gives exactly the same ranges, using 
relative gap size
facet.range.spec=0,+10,50,250,* - gives ranges: 0-10, 10-20, 20-30, 30-40, 
40-50, 20-250, 250-MAX
facet.range.spec=0,10,50,+50,+100,* - gives ranges: 0-10, 10-50, 50-100, 
100-200, 200-300 repeating until max
{noformat}

The first three examples suggest that {{*}} will be treated as "-Infinity" and 
"+Infinity" based on position (ie: the first and last ranges will be unbounded 
on one end) but in the last example the wording "...100-200, 200-300 repeating 
until max" seems inconsistent with that.  

In general, i'm concerned about providing a feature that would attempt to 
produce an infinite number of range queries, but even if that is 
intentional/acceptible the discrepency in syntax bothers me -- I would suggest 
that that sequence should result in the ranges "0-10, 10-50, 50-100, 100-200, 
200-Infinity"

If we want to support the idea of "repeat the last increment continuously" that 
should be with it's own "repeat" syntax such as the "..." (three dots) i 
suggested in comment "17/Feb/11 23:50" above.  I would argue that this should 
only be legal after an increment and before a concrete value (ie: 
{{0,+10,...,100}}).  Requiring it to follow an increment seems like a given 
(otherwise what exactly are you repeating?) requiring that it be followed by an 
absolute value is based on my concern that if it's the last item in the spec 
(or the last item before {{*}}) it results in an infinite number of ranges.

3) The final comment on the page says (in section about facet.range.spec) ...

{quote}
This parameter can be combined with facet.range.include, but is mutually 
exclusive to facet.range.gap, facet.range.begin, facet.range.end and 
facet.range.other, resulting in an exception if uncompatible mix is attempted. 
{quote}

That seems like it isn't specific enough about what is/isn't going to be 
allowed -- particularly since all of the facet.range params can be specified on 
a per field basis.  

Imagine an index of "historic people" docs that provides range faceting on a 
bunch of date fields for significant milestones using common facet.range.start, 
facet.range.end, facet.range.gap params - and the solr admin wants to add 
"facet.range=height" and a "f.height.facet.range.spec" param....  

{code}
facet.range=birth_date
facet.range=first_notable_historic_event
facet.range=last_notable_historic_event
facet.range=death_date
facet.range.start=1500-01-01T00:00:00Z
facet.range.end=NOW/YEAR+1YEAR
facet.range.hardend=false
facet.range.gap=+10YEARS
facet.range=height
f.height.facet.range.spec=*,100,+10,...,300,*
{code}

...that should be a totally legal usecase right? to mix and match this way?  
but how will the code behave?  Technically the "height" field has both a 
facet.range.spec and facet.range.start params specified and there is no way to 
"unset" the default facet.range.start/facet.range.end/facet.range.gap params in 
the context of the "height" field 

4) Related to the same sentence as #3, it says that facet.range.include can be 
used with facet.range.spec, but it doesn't explain how it will be interpreted 
-- this is kind of important since values like "outer" define how the "before" 
and "after" ranges are affected, and values like "edge" affect the "first" and 
"last" "gap ranges".  

Should all ranges produced by facet.range.spec be considered "gap" ranges?  
even the ones with no lower/upper bound?   

What would the following combination mean...

{code}
facet.range.spec=100,150,200,250*
facet.range.include=outer
facet.range.include=edge
{code}

* Are "100" and "250" considered "edge" boundaries?  
* Is "250" considered an "outer" boundery (on the equivilent of an "after" 
range) ?

What about when the spec includes overlapping ranges?

{code}
facet.range.spec=50..150,100..200,150,*
facet.range.include=outer
facet.range.include=edge
{code}

* Is "200" an "edge" boundary?
* Is "150" an "outer" boundary?



> Facet Range Gaps
> ----------------
>
>                 Key: SOLR-2366
>                 URL: https://issues.apache.org/jira/browse/SOLR-2366
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.4, 4.0
>
>         Attachments: SOLR-2366.patch, SOLR-2366.patch
>
>
> There really is no reason why the range gap for date and numeric faceting 
> needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
> and one were doing spatial distance calculations, one could facet by function 
> into 3 different sized buckets: walking distance (0-5KM), driving distance 
> (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
> quantize the results into arbitrarily sized buckets.  I'd propose the syntax 
> to be a comma separated list of sizes for each bucket.  If only one value is 
> specified, then it behaves as it currently does.  Otherwise, it creates the 
> different size buckets.  If the number of buckets doesn't evenly divide up 
> the space, then the size of the last bucket specified is used to fill out the 
> remaining space (not sure on this)
> For instance,
> facet.range.start=0
> facet.range.end=400
> facet.range.gap=5,25,50,100
> would yield buckets of:
> 0-5,5-30,30-80,80-180,180-280,280-380,380-400

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to