Hi all,

In our Solr 6 setup we use string payloads to boost certain tokens (URIs). 
These strings are mapped to floats via a schema parameter "PayloadMapping", 
which can be read out in our custom WKSimilarity class (extending 
TFIDFSimilarity).

<fieldType name="uri_payload" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.DelimitedPayloadTokenFilterFactory" 
encoder="identity" delimiter="|"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
        </analyzer>
               <similarity 
class="com.wolterskluwer.atlas.solr.similarities.WKSimilarityFactory">
                <str name="BM25k1a">0.4</str>
                <str name="BM25k1b">0.4</str>
                <str name="BM25b">0.5</str>
                <str name="IDFCurveFactor">0</str>
                <str name="sloppyFreqCurveFactor">0.0</str>
                <str name="PayloadBoost">10.0</str>
                <str name="PayloadImpact">3.0</str>
                 <str name="PayloadCurveFactor">1.0</str>
                 <str 
name="PayloadMapping">isAbout=15.0,coversFiscalPeriod=10.0,type=5.0,hasTheme=5.0,subject=4.0,mentions=2.0,creator=2.0</str>
               </similarity>
</fieldType>

The reason for this indirection is convenience: by storing payload strings 
i.s.o. floats we could change & tune the boosts easily by updating the schema 
without having to change the content set.
Inside WKSimilarity each payload string is mapped to its corresponding boost 
value and the final boost is applied via the scorePayload method (where we 
could tune the boost curve via some additional schema parameters). This works 
well in Solr 6.

The problem: we are about to migrate to Solr 8 and after LUCENE-8014 it isn't 
possible anymore the override the scorePayload method in WKSimilarity (it is 
removed from TFIDFSimilarity). I wonder what alternatives there are for mapping 
strings payload to floats and use them in a tunable formula for boosting.

Thanks,
Tom Burgmans

Reply via email to