Hi Team,

I am new to Apache Solr. I may be missing something obvious. I am trying to
remove the duplicates from the search results in Solr 8.6 and I am trying
to use solr.ShingleFilterFactory and solr.MinHashFilterFactory. Attaching
the snippet here,

<fieldType name="text_min_hash" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ShingleFilterFactory" minShingleSize="3"
maxShingleSize="4"
         outputUnigrams="true" outputUnigramsIfNoShingles="false"/>
        <filter class="solr.MinHashFilterFactory" bucketCount="512"
hashSetSize="1" hashCount="1" withRotation="true" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
    </analyzer>
</fieldType>

However, it is not really removing the duplicates from the results. Kindly
let me know if I am missing something. Any leads would be appreciated.

Thanks & Regards,
-Sourav.

Reply via email to