On Jan 22, 2007, at 2:49 PM, Jens Kraemer wrote:
> On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote:
>> Greetings,
>>
>> (using acts_as_ferret)
>>
>> So I have a book title "Möngrel „Horsemen“" in my index.
>>
>> Searching for "Möngrel" retrieves the document.
>>
>> But I would like searching for "Mongrel" to also retrieve the
>> document.
>> Which it does not currently.
>>
>> Anyone have any good solutions to this problem?
>>
>> I suppose I could filter the documents and queries first which
>> something
>> like:
>>
>>
>> (Iconv.new('US-ASCII//TRANSLIT', 'utf-8').iconv "Möngrel
>> „Horsemen“").gsub(/[^a-zA-Z0-9/im,"")
>>
>> But perhaps there is a better, or built in solution.
>
> I don't think so - a custom Analyzer would be the right place for
> this.
We use a normalizer to store/query (to be revised for Rails 1.2):
# Utility method that retursn an ASCIIfied, downcased, and
sanitized string.
# It relies on the Unicode Hacks plugin by means of String#chars.
We assume
# $KCODE is 'u' in environment.rb. By now we support a wide range
of latin
# accented letters, based on the Unicode Character Palette bundled
in Macs.
def self.normalize(str)
n = str.chars.downcase.strip.to_s
n.gsub!(/[àáâãäåāă]/, 'a')
n.gsub!(/æ/, 'ae')
n.gsub!(/[ďđ]/, 'd')
n.gsub!(/[çćčĉċ]/, 'c')
n.gsub!(/[èéêëēęěĕė]/, 'e')
n.gsub!(/ƒ/, 'f')
n.gsub!(/[ĝğġģ]/, 'g')
n.gsub!(/[ĥħ]/, 'h')
n.gsub!(/[ììíîïīĩĭ]/, 'i')
n.gsub!(/[įıijĵ]/, 'j')
n.gsub!(/[ķĸ]/, 'k')
n.gsub!(/[łľĺļŀ]/, 'l')
n.gsub!(/[ñńňņʼnŋ]/, 'n')
n.gsub!(/[òóôõöøōőŏŏ]/, 'o')
n.gsub!(/œ/, 'oe')
n.gsub!(/ą/, 'q')
n.gsub!(/[ŕřŗ]/, 'r')
n.gsub!(/[śšşŝș]/, 's')
n.gsub!(/[ťţŧț]/, 't')
n.gsub!(/[ùúûüūůűŭũų]/, 'u')
n.gsub!(/ŵ/, 'w')
n.gsub!(/[ýÿŷ]/, 'y')
n.gsub!(/[žżź]/, 'z')
n.gsub!(/\s+/, ' ')
n.gsub!(/[^\sa-z0-9_-]/, '')
n
end
And this convenience class method to use in Rails models with
acts_as_ferret (slightly edited):
# Wrapper function to normalize fields before calling acts_as_ferret
#
# Usage: index_fields [:field1, :field2], :option1
=> ..., :option2 => ...
#
# Please note that your queries should use a "_normalized" suffix on
# each field, i.e: +field1_normalized:foo
class ActiveRecord::Base
def self.index_fields(fields, *options)
aaf_fields = []
fields.each do |f|
class_eval <<-EOS
def #{f}_normalized
MyAppUtils.normalize(#{f})
end
EOS
aaf_fields.push ":#{f}_normalized"
end
aaf_call = 'acts_as_ferret :fields => [' + aaf_fields.join
(',') + ']'
options.each do |option_pair|
option_pair.each do |key, value|
aaf_call << ", :#{key} => #{value}"
end
end
logger.info aaf_call
class_eval(aaf_call)
end
end
-- fxn
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk