On Jan 22, 2007, at 2:49 PM, Jens Kraemer wrote:

> On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote:
>> Greetings,
>>
>> (using acts_as_ferret)
>>
>> So I have a book title "Möngrel „Horsemen“" in my index.
>>
>> Searching for "Möngrel" retrieves the document.
>>
>> But I would like searching for "Mongrel" to also retrieve the  
>> document.
>> Which it does not currently.
>>
>> Anyone have any good solutions to this problem?
>>
>> I suppose I could filter the documents and queries first which  
>> something
>> like:
>>
>>
>> (Iconv.new('US-ASCII//TRANSLIT', 'utf-8').iconv "Möngrel
>> „Horsemen“").gsub(/[^a-zA-Z0-9/im,"")
>>
>> But perhaps there is a better, or built in solution.
>
> I don't think so - a custom Analyzer would be the right place for
> this.

We use a normalizer to store/query (to be revised for Rails 1.2):

   # Utility method that retursn an ASCIIfied, downcased, and  
sanitized string.
   # It relies on the Unicode Hacks plugin by means of String#chars.  
We assume
   # $KCODE is 'u' in environment.rb. By now we support a wide range  
of latin
   # accented letters, based on the Unicode Character Palette bundled  
in Macs.
   def self.normalize(str)
     n = str.chars.downcase.strip.to_s
     n.gsub!(/[àáâãäåāă]/,    'a')
     n.gsub!(/æ/,            'ae')
     n.gsub!(/[ďđ]/,          'd')
     n.gsub!(/[çćčĉċ]/,       'c')
     n.gsub!(/[èéêëēęěĕė]/,   'e')
     n.gsub!(/ƒ/,             'f')
     n.gsub!(/[ĝğġģ]/,        'g')
     n.gsub!(/[ĥħ]/,           'h')
     n.gsub!(/[ììíîïīĩĭ]/,    'i')
     n.gsub!(/[įıijĵ]/,        'j')
     n.gsub!(/[ķĸ]/,          'k')
     n.gsub!(/[łľĺļŀ]/,       'l')
     n.gsub!(/[ñńňņʼnŋ]/,      'n')
     n.gsub!(/[òóôõöøōőŏŏ]/,  'o')
     n.gsub!(/œ/,            'oe')
     n.gsub!(/ą/,             'q')
     n.gsub!(/[ŕřŗ]/,         'r')
     n.gsub!(/[śšşŝș]/,       's')
     n.gsub!(/[ťţŧț]/,        't')
     n.gsub!(/[ùúûüūůűŭũų]/,  'u')
     n.gsub!(/ŵ/,             'w')
     n.gsub!(/[ýÿŷ]/,         'y')
     n.gsub!(/[žżź]/,         'z')
     n.gsub!(/\s+/,            ' ')
     n.gsub!(/[^\sa-z0-9_-]/,   '')
     n
   end

And this convenience class method to use in Rails models with  
acts_as_ferret (slightly edited):

   # Wrapper function to normalize fields before calling acts_as_ferret
   #
   # Usage: index_fields [:field1, :field2], :option1  
=> ..., :option2 => ...
   #
   # Please note that your queries should use a "_normalized" suffix on
   # each field, i.e: +field1_normalized:foo
   class ActiveRecord::Base
     def self.index_fields(fields, *options)
       aaf_fields = []
       fields.each do |f|
         class_eval <<-EOS
           def #{f}_normalized
             MyAppUtils.normalize(#{f})
           end
         EOS
         aaf_fields.push ":#{f}_normalized"
       end
       aaf_call = 'acts_as_ferret :fields => [' + aaf_fields.join 
(',') + ']'
       options.each do |option_pair|
         option_pair.each do |key, value|
           aaf_call << ", :#{key} => #{value}"
         end
       end
       logger.info aaf_call
       class_eval(aaf_call)
     end
   end

-- fxn

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to