[ 
https://issues.apache.org/jira/browse/SOLR-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-16291:
--------------------------------
    Status: Patch Available  (was: Open)

> Decay function queries gauss,linear,exponential
> -----------------------------------------------
>
>                 Key: SOLR-16291
>                 URL: https://issues.apache.org/jira/browse/SOLR-16291
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers, search
>    Affects Versions: 9.0
>            Reporter: Dan Rosher
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Description
> This is a Solr version of the Decay functions [available in 
> Elasticsearch|[https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html|https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-decay]
>  ]
> To see how the functions work [see 
> here|https://www.desmos.com/calculator/a7i0bwz5ha]
> Decay functions score a document with a function that decays depending on the 
> distance of a numeric field value of the document from a user given origin. 
> This is similar to a range query, but with smooth edges instead of boxes.
> To use distance scoring on a query that has numerical fields, the user has to 
> define an origin and a scale for each field. The origin is needed to define 
> the “central point” from which the distance is calculated, and the scale to 
> define the rate of decay. The decay function is specified as
>  
> {code:java}
> <decay_function>(<field-name>,scale,origin,offset,decay) for numerical/date 
> field
> <decay_function>(<field-name>,scale,origin_lat,origin_lon,offset,decay) for 
> geo fields {code}
>  * <decay_function> should be one of 'linear', 'exp', or 'gauss'
>  * The <field-name> must be a NumericFieldType, DatePointField, or 
> LatLonPointSpatialField field, NOT multi-valued. e.g. 
> linear("location","23km",52.0247, -0.490,"0km",0.5)
> In the above example, the field is a geo_point and origin can be provided in 
> geo format. scale and offset must be given with a unit in this case. If your 
> field is a date field, you can set scale and offset as days, hours, as with 
> DateMath.
>  
> e.g. gauss(pdate,"+2DAY+6HOUR","2021-07-20T00:00:00Z","+3DAY",0.5)
>  
> pdate: DatePointField "+2DAY+6HOUR": range "2021-07-20T00:00:00Z: origin 
> (defaults to NOW) "+3DAY: offset (defaults to zero) 0.5: decay{*}{{*}}
>  
>  * *origin* The point of origin used for calculating distance. Must be given 
> as a number for numeric field, date for date fields and geo point for geo 
> fields. Required for geo and numeric field. For date fields the default is 
> NOW. Date math (for example NOW-1h) is supported for origin.
>  * *scale* Required for all types. Defines the distance from origin + offset 
> at which the computed score will equal decay parameter. For geo fields: Can 
> be defined as number+unit (1km, 12m,...). Default unit is KM. For date 
> fields: Can to be defined as a number+unit ("1h", "10d",…). For numeric 
> field: Any number.
>  * *offset* If an offset is defined, the decay function will only compute the 
> decay function for documents with a distance greater than the defined offset. 
> The default is 0.
>  * *decay* The decay parameter defines how documents are scored at the 
> distance given at scale. If no decay is defined, documents at the distance 
> scale will be scored 0.5.
>  
> To get a feel for how these function work you can see [here on 
> desmos|https://www.desmos.com/calculator/a7i0bwz5ha] . Adjust origin, offset, 
> scale and decay to get a feel of how these parameters adjust the equation for 
> gauss, exp or linear.
> h3. Supported decay functions
>  
> The DECAY_FUNCTION determines the shape of the decay:
> *gauss* Normal decay, computed as:
> score(doc) = exp(- (max(0,|doc.val - origin| - offset)^2)/2sig^2)
> where sig is computed to assure that the score takes the value decay at 
> distance scale from origin+-offset
> sig^2 = -scale^2/(2.ln(decay))
> *exp* Exponential decay, computed as:
> score(doc) = exp(lmda . max(0,|doc.val - origin| - offset))
> lmda = ln(decay)/scale
> where again the parameter lambda is computed to assure that the score takes 
> the value decay at distance scale from origin+-offset
> *linear* Linear decay, computed as:
> score(doc) = max((s-v)/s,0)
> where: v = max(0,|doc.val - origin| - offset) s = scale(1.0-decay))
> where again the parameter s is computed to assure that the score takes the 
> value decay at distance scale from origin+-offset
> In contrast to the normal and exponential decay, this function actually sets 
> the score to 0 if the field value exceeds twice the user given scale value.
> For single functions the three decay functions together with their parameters 
> can be visualized like this (the field in this example called "age"):
> h3. Detailed example
> Suppose you are searching for a hotel in a certain town. Your budget is 
> limited. Also, you would like the hotel to be close to the town center, so 
> the farther the hotel is from the desired location the less likely you are to 
> check in.
> You would like the query results that match your criterion (for example, 
> "hotel, Nancy, non-smoker") to be scored with respect to distance to the town 
> center and also the price.
> Intuitively, you would like to define the town center as the origin and maybe 
> you are willing to walk 2km to the town center from the hotel.In this case 
> your origin for the location field is the town center and the scale is ~2km.
> If your budget is low, you would probably prefer something cheap above 
> something expensive. For the price field, the origin would be 0 Euros and the 
> scale depends on how much you are willing to pay, for example 20 Euros.
> In this example, the fields might be called "price" for the price of the 
> hotel and "location" for the coordinates of this hotel.
> The function for price in this case could be:
> {noformat}
> gauss("price",20,0) //or linear,exp {noformat}
> and for location:
> {noformat}
> gauss("location","2km",11,12) //or linear,exp{noformat}
> Suppose you want to multiply these two functions on the original score, the 
> request would look like this:
> {noformat}
> b=mul( gauss("price",20,0),gauss("location","2km",11,12)) 
> &q={!boost b=$b v=$qq} 
> &qq={!edismax }*:* 
> &sort=score+desc 
> &fl=*,score{noformat}
> Suppose your original search results matches three hotels :
>  * "Backback Nap"
>  * "Drink n Drive"
>  * "BnB Bellevue".
> "Drink n Drive" is pretty far from your defined location (nearly 2 km) and is 
> not too cheap (about 13 Euros) so it gets a low factor a factor of 0.56.
> "BnB Bellevue" and "Backback Nap" are both pretty close to the defined 
> location but "BnB Bellevue" is cheaper, so it gets a multiplier of 0.86 
> whereas "Backpack Nap" gets a value of 0.66.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to