[
https://issues.apache.org/jira/browse/SOLR-16291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kevin Risden updated SOLR-16291:
--------------------------------
Status: Patch Available (was: Open)
> Decay function queries gauss,linear,exponential
> -----------------------------------------------
>
> Key: SOLR-16291
> URL: https://issues.apache.org/jira/browse/SOLR-16291
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: query parsers, search
> Affects Versions: 9.0
> Reporter: Dan Rosher
> Priority: Minor
> Time Spent: 10m
> Remaining Estimate: 0h
>
> h2. Description
> This is a Solr version of the Decay functions [available in
> Elasticsearch|[https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html|https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-decay]
> ]
> To see how the functions work [see
> here|https://www.desmos.com/calculator/a7i0bwz5ha]
> Decay functions score a document with a function that decays depending on the
> distance of a numeric field value of the document from a user given origin.
> This is similar to a range query, but with smooth edges instead of boxes.
> To use distance scoring on a query that has numerical fields, the user has to
> define an origin and a scale for each field. The origin is needed to define
> the “central point” from which the distance is calculated, and the scale to
> define the rate of decay. The decay function is specified as
>
> {code:java}
> <decay_function>(<field-name>,scale,origin,offset,decay) for numerical/date
> field
> <decay_function>(<field-name>,scale,origin_lat,origin_lon,offset,decay) for
> geo fields {code}
> * <decay_function> should be one of 'linear', 'exp', or 'gauss'
> * The <field-name> must be a NumericFieldType, DatePointField, or
> LatLonPointSpatialField field, NOT multi-valued. e.g.
> linear("location","23km",52.0247, -0.490,"0km",0.5)
> In the above example, the field is a geo_point and origin can be provided in
> geo format. scale and offset must be given with a unit in this case. If your
> field is a date field, you can set scale and offset as days, hours, as with
> DateMath.
>
> e.g. gauss(pdate,"+2DAY+6HOUR","2021-07-20T00:00:00Z","+3DAY",0.5)
>
> pdate: DatePointField "+2DAY+6HOUR": range "2021-07-20T00:00:00Z: origin
> (defaults to NOW) "+3DAY: offset (defaults to zero) 0.5: decay{*}{{*}}
>
> * *origin* The point of origin used for calculating distance. Must be given
> as a number for numeric field, date for date fields and geo point for geo
> fields. Required for geo and numeric field. For date fields the default is
> NOW. Date math (for example NOW-1h) is supported for origin.
> * *scale* Required for all types. Defines the distance from origin + offset
> at which the computed score will equal decay parameter. For geo fields: Can
> be defined as number+unit (1km, 12m,...). Default unit is KM. For date
> fields: Can to be defined as a number+unit ("1h", "10d",…). For numeric
> field: Any number.
> * *offset* If an offset is defined, the decay function will only compute the
> decay function for documents with a distance greater than the defined offset.
> The default is 0.
> * *decay* The decay parameter defines how documents are scored at the
> distance given at scale. If no decay is defined, documents at the distance
> scale will be scored 0.5.
>
> To get a feel for how these function work you can see [here on
> desmos|https://www.desmos.com/calculator/a7i0bwz5ha] . Adjust origin, offset,
> scale and decay to get a feel of how these parameters adjust the equation for
> gauss, exp or linear.
> h3. Supported decay functions
>
> The DECAY_FUNCTION determines the shape of the decay:
> *gauss* Normal decay, computed as:
> score(doc) = exp(- (max(0,|doc.val - origin| - offset)^2)/2sig^2)
> where sig is computed to assure that the score takes the value decay at
> distance scale from origin+-offset
> sig^2 = -scale^2/(2.ln(decay))
> *exp* Exponential decay, computed as:
> score(doc) = exp(lmda . max(0,|doc.val - origin| - offset))
> lmda = ln(decay)/scale
> where again the parameter lambda is computed to assure that the score takes
> the value decay at distance scale from origin+-offset
> *linear* Linear decay, computed as:
> score(doc) = max((s-v)/s,0)
> where: v = max(0,|doc.val - origin| - offset) s = scale(1.0-decay))
> where again the parameter s is computed to assure that the score takes the
> value decay at distance scale from origin+-offset
> In contrast to the normal and exponential decay, this function actually sets
> the score to 0 if the field value exceeds twice the user given scale value.
> For single functions the three decay functions together with their parameters
> can be visualized like this (the field in this example called "age"):
> h3. Detailed example
> Suppose you are searching for a hotel in a certain town. Your budget is
> limited. Also, you would like the hotel to be close to the town center, so
> the farther the hotel is from the desired location the less likely you are to
> check in.
> You would like the query results that match your criterion (for example,
> "hotel, Nancy, non-smoker") to be scored with respect to distance to the town
> center and also the price.
> Intuitively, you would like to define the town center as the origin and maybe
> you are willing to walk 2km to the town center from the hotel.In this case
> your origin for the location field is the town center and the scale is ~2km.
> If your budget is low, you would probably prefer something cheap above
> something expensive. For the price field, the origin would be 0 Euros and the
> scale depends on how much you are willing to pay, for example 20 Euros.
> In this example, the fields might be called "price" for the price of the
> hotel and "location" for the coordinates of this hotel.
> The function for price in this case could be:
> {noformat}
> gauss("price",20,0) //or linear,exp {noformat}
> and for location:
> {noformat}
> gauss("location","2km",11,12) //or linear,exp{noformat}
> Suppose you want to multiply these two functions on the original score, the
> request would look like this:
> {noformat}
> b=mul( gauss("price",20,0),gauss("location","2km",11,12))
> &q={!boost b=$b v=$qq}
> &qq={!edismax }*:*
> &sort=score+desc
> &fl=*,score{noformat}
> Suppose your original search results matches three hotels :
> * "Backback Nap"
> * "Drink n Drive"
> * "BnB Bellevue".
> "Drink n Drive" is pretty far from your defined location (nearly 2 km) and is
> not too cheap (about 13 Euros) so it gets a low factor a factor of 0.56.
> "BnB Bellevue" and "Backback Nap" are both pretty close to the defined
> location but "BnB Bellevue" is cheaper, so it gets a multiplier of 0.86
> whereas "Backpack Nap" gets a value of 0.66.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]