[jira] [Commented] (SOLR-2366) Facet Range Gaps

Hoss Man (Commented) (JIRA) Mon, 10 Oct 2011 17:10:57 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124591#comment-13124591
 ]


Hoss Man commented on SOLR-2366:
--------------------------------


Jan: I've got to be completely honest here -- catching up on this issue, I got 
really confused and lost by some of your comments and the updated docs.

This sequence of comments really stands out at me...

{quote}
I have no good answer to this, other than inventing some syntax.
...
I think the values facet.range.include=upper/lower is clear. Outer/edge would 
need some more work/definition.
...
*My primary reason for suggesting this is to give users a terse, intuitive 
syntax for ranges.*
...
One thing this improvement needs to tackle is how to return the range buckets 
in the Response. It will not be enough with the simple range_facet format ... 
We need something which can return the explicit ranges,
{quote}

(emphasis added by me)

I really liked the simplicity of your earlier proposal, and I agree that it 
would be really powerful/helpful to give users a terse, intuitive syntax for 
specifying sequential ranges of variable sizes -- but it seems like we're 
really moving away from the syntax being "intuitive" because of the hoops 
you're having to jump through to treat this as an extension of the existing 
"facet.range" param in your design.

I think we really ought to revisit my earlier suggestion to approach this as an 
entirely new "type" of faceting - not a new plugin or a contrib, but a new 
first-class type of faceting that FacetComponent would support, right along 
side facet.field, facet.query, and facet.range.  Let's ignore everything about 
the existing facet.range.* param syntax, and the facet_range response format, 
and think about what makes the most sense for this feature on it's own.  If 
there are ideas from facet.range that make sense to carry over (like 
facet.range.include) then great -- but let's approach it from the "something 
new that can borrow from facet.range" standpoint instead of the "extension to 
facet.range that has a bunch of caveats with how facet.range already works"

I mean: if it looks like a duck, walks like a duck, and quacks like a duck, 
then i'm happy to call it a duck -- but in this case:
 * doesn't make sense with facet.range.other
 * needs special start/end syntax to play nice with facet.range.start/end
 * needs to change the response format

...ie: it doesn't look the same, it doesn't walk the same, and it doesn't quack.

---

Regardless of whether this functionality becomes part of facet.range or not, I 
wanted to comment specifically on this idea...

bq. If all gaps are specified as explicit ranges this is no ambiguity, so we 
could require all gaps to be explicit ranges if one wants to use it?

This seems like a really harsh limitation to impose.  If the only way to use an 
explicit range is in use cases where you *only* use explicit ranges, then what 
value add does this feature give you over just using multiple facet.query 
params? (it might be marginally fewer characters, but multiple facet.query 
params seem more intuitive and easier to read).  I mean: I don't have a 
solution to propose, it just seems like there's not much point in supporting 
explicit ranges in that case.

---

Having not thought about this issue in almost a month, and revisiting it with 
(fairly) fresh eyes, and thinking about all the use cases that have been 
discussed, it seems like the main goals we should address are really:

 * an intuitive syntax for specifying end points for ranges of varying sizes
 * ability to specify range end points using either fixed values or increments
 * ability to specify that ranges should be either use sequential end points, 
or be overlapping relative some fixed min/max value

In other words: the only reason (that i know of) why overlapping ranges even 
came up in this issue was use cases like...

{noformat}
   Price: $0-10, $0-20, $0-50, $0-100
   Date: NOW-1DAY TO NOW, NOW-1MONTH TO NOW, NOW-1YEAR TO NOW
{noformat}

...there doesn't seem to be a lot of motivations for using overlapping ranges 
in the "middle" of a sequence, and these types of use cases where *all* the 
ranges overlap seem just as important as use cases where the ranges don't 
overlap at all...

{noformat}
   Price: $0-10, $10-20, $20-50, $50-100
   Date: NOW-1DAY TO NOW, NOW-1MONTH TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH
{noformat}

...so let's try to focus on a syntax that makes both easy, using both fixed and 
relative values, w/o worrying about supporting arbitrary overlapping ranges 
(since I can't think of a use case for it, and it could always be achieved 
using facet.query)

So how about something like...

{noformat}
 facet.sequence=<fieldname>
 facet.sequence.spec=[<wild>,]?<val>,<relval>[,<relval>]*[,<wild>]?
 facet.sequence.type=[before|after|between]
 facet.sequence.include=(same as facet.range.include)
{noformat}

Where "relval" would either be a concrete value, or a relative value; the 
effective sequence has to either increase or decrease consistently or it's an 
error; and "facet.sequence.type" determines whether the ranges are overlapping 
("before" and "after") or not ("between")

So if you had a spec like this...
{noformat}
 facet.sequence.spec=0,10,+10,50,+50
{noformat}

Then depending on facet.sequence.type you could either get...

{noformat}
 facet.sequence.type=after
     Price: $0-10, $0-20, $0-50, $0-100
 facet.sequence.type=between
     Price: $0-10, $10-20, $20-50, $50-100
 facet.sequence.type=before
     Price: $0-100, $10-100, $20-100, $50-100
{noformat}

"*" could be used at the start or end to indicate that you wanted an unbounded 
range, but it wouldn't be a factor in determining the "fixed point" used if 
type was "after" or "before", ie...

{noformat}
 f.price.facet.sequence.spec=*,0,10,+10,50,+50,*
 f.created.facet.sequence.spec=NOW,-1DAY,-1MONTH,-1YEAR

 facet.sequence.type=after
     Price: below $0, $0-10, $0-20, $0-50, $0-100, $100 and up
     Created: NOW-1YEAR TO NOW, NOW-1YEAR TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH
 facet.sequence.type=between
     Price: below $0, $0-10, $10-20, $20-50, $50-100, $100 and up
     Created: NOW-1DAY TO NOW, NOW-1MONTH TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH
 facet.sequence.type=before
     Price: below $0, $0-100, $10-100, $20-100, $50-100, $100 and up
     Created: NOW-1DAY TO NOW, NOW-1MONTH TO NOW, NOW-1YEAR TO NOW
{noformat}

...if we defined things that way, i *think* that would simplify a lot of the 
complexity we've been talking about, and simplify some of the use cases.

the only remaining issues that have been brought up (that i can think of) that 
would still need to be work out would be:

1) what the response format needs to look like - I'd vote to punt on this until 
we figure out the semantics.

2) when exactly ranges are inclusive/exclusive of their endpoints - i *think* 
we should be able reuse the semantics from facet.range.include here, including 
"edge", if we define ranges involving "*" as "outer" ranges, but we'd need to 
work through more scenarios to be sure.

3) what happens if an increment overlaps with an absolute value, ie: my 
original example of "10,20,+50,+100,120,150".  The three possible solutions I 
can think of are:

 * fail loudly
 * implement "precedence" rules, ie: that absolute values trump relative values 
(10-20,20-70,70-120,120-150) or vice-versa (10-20,20-70,70-170)
 * implement precedence rules but let them be controlled via a request param 
(similar to how "facet.range.hardend" works)

---

What do you think?  Are there any key use cases / features we've talked about 
that you think this approach overlooks?  Do you still think it should really be 
an extension to "facet.range" ?


                
> Facet Range Gaps
> ----------------
>
>                 Key: SOLR-2366
>                 URL: https://issues.apache.org/jira/browse/SOLR-2366
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.5, 4.0
>
>         Attachments: SOLR-2366.patch, SOLR-2366.patch
>
>
> There really is no reason why the range gap for date and numeric faceting 
> needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
> and one were doing spatial distance calculations, one could facet by function 
> into 3 different sized buckets: walking distance (0-5KM), driving distance 
> (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
> quantize the results into arbitrarily sized buckets.
> (Original syntax proposal removed, see discussion for concrete syntax)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2366) Facet Range Gaps

Reply via email to