Re: New implementation of MLT

Osullivan L . Wed, 03 Apr 2013 05:09:03 -0700

Greetings,

I have a custom analyzer which converts Library of Congress Callnumbers into 
normalized strings:


   <fieldType name="LCNormalized" class="solr.TextField" sortMissingLast="true" 
omitNorms="true">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="org.vufind.solr.analysis.LCCNormalizeFilterFactory"/>
      </analyzer>
    </fieldType>
   <field name="callnumber-normalized" type="LCNormalized" indexed="true" 
stored="true" />

Thus, values like:

PQ239.Z56
PQ239.H63 2008
PQ239.S62 1982
PQ239.B68 1983
PQ2390.S35 A5
PQ2390.S35 B8 1898
PQ2389 .R65 F3 1854 t.1
PQ239.A7 1969
PQ2.N6 1959
PQ22.A4 D47 1949
PQ238.L57 1985

become:

PQ 0239.000000 Z0.560000
PQ 0239.000000 H0.630000 002008
PQ 0239.000000 S0.620000 001982
PQ 0239.000000 B0.680000 001983
PQ 2390.000000 S0.350000 A0.500000
PQ 2390.000000 S0.350000 B0.800000 001898
PQ 2389.000000 R0.650000 F0.300000 001854 T.000001
PQ 0002.000000 N0.600000 001959
PQ 0022.000000 A0.400000 D0.470000 001949
PQ 0238.000000 L0.570000 001985

This allows items to be accurately sorted by callnumber.

I would also like to perform ranged searches on the normalised callnumber but 
whereas callnumber-normalized=[DS+TO+FE] will correctly list items with 
callnumbers between DS and FE, starting with DT* and finishing with FD* , 
callnumber-normalized=[DS763+TO+FE] incorrectly starts at DT* and finishes with 
FD*.

Can anyone explain why this might be the case?

Looking at 
http://wiki.apache.org/solr/MultitermQueryAnalysis#Current_components_that_implement_MultiTermAwareComponent,
 would I have to add one of the MultiTermAware Factories to make this work?

Thanks,

Luke


--
Luke O'Sullivan
Systems Developer
Web Team
Swansea University, Singleton Park, Swansea SA2 8PP, UK
l.osulli...@swansea.ac.uk<mailto:l.osulli...@swansea.ac.uk>
01792 602772
@l_os_cymru

Re: New implementation of MLT

Reply via email to