[
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099315#comment-13099315
]
Hoss Man commented on SOLR-2366:
--------------------------------
Jan: i took a look at r3 of your VariableRangeGaps wiki, here are the things
I'm concerned about because they seem a bit confusing/ambiguious....
1) we need to decide what the behavior should be when the spec identifies
values out of order (ie: {{10, 50, 30}}) ... it might be tempting to say "allow
them, and swap the values" (ie: "10-50, 30-50") but the merit of that approach
doesn't seem worth the potential risk of silently hiding errors (ie: if the
user made a typo and ment "10-50, 50-130") not to mention it could be really
hard to understand what's going on in the case where some values are specified
absolutely and some are specified as incriments (see bullet #3 in my "02/Apr/11
23:43" comment above -- ie: what ranges would we produce for
{{10,20,+50,+100,120,150}} ?).
I would suggest define any case where the spec contains absolute value N after
(effective) value M where N < M as an error and fail fast.
Still not sure what (if anything) should be done about overlapping ranges that
appear out of order (ie: {{0,100,50..90,150}} ... is that "0-100,50-90,90-150"
?)
2) Independent of my opinion on the {{*}} syntax, I'm a little concerned by the
descrepency in these examples...
{noformat}
facet.range.spec=*,10,50,100,250,* - gives 5 ranges: MIN-10, 10-50, 50-100,
100-250, 250->MAX
facet.range.spec=*,10,+40,+50,250,* - gives exactly the same ranges, using
relative gap size
facet.range.spec=0,+10,50,250,* - gives ranges: 0-10, 10-20, 20-30, 30-40,
40-50, 20-250, 250-MAX
facet.range.spec=0,10,50,+50,+100,* - gives ranges: 0-10, 10-50, 50-100,
100-200, 200-300 repeating until max
{noformat}
The first three examples suggest that {{*}} will be treated as "-Infinity" and
"+Infinity" based on position (ie: the first and last ranges will be unbounded
on one end) but in the last example the wording "...100-200, 200-300 repeating
until max" seems inconsistent with that.
In general, i'm concerned about providing a feature that would attempt to
produce an infinite number of range queries, but even if that is
intentional/acceptible the discrepency in syntax bothers me -- I would suggest
that that sequence should result in the ranges "0-10, 10-50, 50-100, 100-200,
200-Infinity"
If we want to support the idea of "repeat the last increment continuously" that
should be with it's own "repeat" syntax such as the "..." (three dots) i
suggested in comment "17/Feb/11 23:50" above. I would argue that this should
only be legal after an increment and before a concrete value (ie:
{{0,+10,...,100}}). Requiring it to follow an increment seems like a given
(otherwise what exactly are you repeating?) requiring that it be followed by an
absolute value is based on my concern that if it's the last item in the spec
(or the last item before {{*}}) it results in an infinite number of ranges.
3) The final comment on the page says (in section about facet.range.spec) ...
{quote}
This parameter can be combined with facet.range.include, but is mutually
exclusive to facet.range.gap, facet.range.begin, facet.range.end and
facet.range.other, resulting in an exception if uncompatible mix is attempted.
{quote}
That seems like it isn't specific enough about what is/isn't going to be
allowed -- particularly since all of the facet.range params can be specified on
a per field basis.
Imagine an index of "historic people" docs that provides range faceting on a
bunch of date fields for significant milestones using common facet.range.start,
facet.range.end, facet.range.gap params - and the solr admin wants to add
"facet.range=height" and a "f.height.facet.range.spec" param....
{code}
facet.range=birth_date
facet.range=first_notable_historic_event
facet.range=last_notable_historic_event
facet.range=death_date
facet.range.start=1500-01-01T00:00:00Z
facet.range.end=NOW/YEAR+1YEAR
facet.range.hardend=false
facet.range.gap=+10YEARS
facet.range=height
f.height.facet.range.spec=*,100,+10,...,300,*
{code}
...that should be a totally legal usecase right? to mix and match this way?
but how will the code behave? Technically the "height" field has both a
facet.range.spec and facet.range.start params specified and there is no way to
"unset" the default facet.range.start/facet.range.end/facet.range.gap params in
the context of the "height" field
4) Related to the same sentence as #3, it says that facet.range.include can be
used with facet.range.spec, but it doesn't explain how it will be interpreted
-- this is kind of important since values like "outer" define how the "before"
and "after" ranges are affected, and values like "edge" affect the "first" and
"last" "gap ranges".
Should all ranges produced by facet.range.spec be considered "gap" ranges?
even the ones with no lower/upper bound?
What would the following combination mean...
{code}
facet.range.spec=100,150,200,250*
facet.range.include=outer
facet.range.include=edge
{code}
* Are "100" and "250" considered "edge" boundaries?
* Is "250" considered an "outer" boundery (on the equivilent of an "after"
range) ?
What about when the spec includes overlapping ranges?
{code}
facet.range.spec=50..150,100..200,150,*
facet.range.include=outer
facet.range.include=edge
{code}
* Is "200" an "edge" boundary?
* Is "150" an "outer" boundary?
> Facet Range Gaps
> ----------------
>
> Key: SOLR-2366
> URL: https://issues.apache.org/jira/browse/SOLR-2366
> Project: Solr
> Issue Type: Improvement
> Reporter: Grant Ingersoll
> Priority: Minor
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2366.patch, SOLR-2366.patch
>
>
> There really is no reason why the range gap for date and numeric faceting
> needs to be evenly spaced. For instance, if and when SOLR-1581 is completed
> and one were doing spatial distance calculations, one could facet by function
> into 3 different sized buckets: walking distance (0-5KM), driving distance
> (5KM-150KM) and everything else (150KM+), for instance. We should be able to
> quantize the results into arbitrarily sized buckets. I'd propose the syntax
> to be a comma separated list of sizes for each bucket. If only one value is
> specified, then it behaves as it currently does. Otherwise, it creates the
> different size buckets. If the number of buckets doesn't evenly divide up
> the space, then the size of the last bucket specified is used to fill out the
> remaining space (not sure on this)
> For instance,
> facet.range.start=0
> facet.range.end=400
> facet.range.gap=5,25,50,100
> would yield buckets of:
> 0-5,5-30,30-80,80-180,180-280,280-380,380-400
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]