[ 
https://issues.apache.org/jira/browse/SOLR-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187574#comment-13187574
 ] 

Jan Høydahl commented on SOLR-2827:
-----------------------------------

Example usage:

{code}
<processor class="org.apache.solr.update.processor.RegexpBoostProcessorFactory">
  <bool name="enabled">true</bool>
  <str name="inputField">url</str>
  <str name="boostField">urlboost</str>
  <str name="boostFilename">${solr.solr.home}/conf/rank/urlboosts.txt</str>
</processor>
{code}

Sample urlboosts.txt file:
{noformat}
# Sample config file for RegexBoostProcessor
# This example applies boost on the "url" field to boost or deboost certain urls
# All rules are evaluated, and if several of them match, the boosts are 
multiplied.
# If for example one rule with boost 2.0 and one rule with boost 0.1 match, the 
resulting urlboost=0.2

https?://[^/]+/old/.* 0.1               #Comments are removed
https?://[^/]+/.*index\([0-9]\).html$   0.5

# Prioritize certain sites over others
https?://www.mydomain.no/.*     1.5
{noformat}

The output boost field can then be used query time to tune relevance.
                
> RegexpBoost Update Processor
> ----------------------------
>
>                 Key: SOLR-2827
>                 URL: https://issues.apache.org/jira/browse/SOLR-2827
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Jan Høydahl
>              Labels: UpdateProcessor
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2827.patch
>
>
> Processor which reads a string field and outputs a float field with a boost 
> value if the input string matched one of several RegEx.
> The processor reads a separate file with one RegEx per line with associated 
> boost value.
> We used it to (de)boost web pages based on URL patterns. Could be used for 
> many other use cases as well
> Kindly donated by Oslo University

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to