Re: Accented search

Peter Cline Tue, 11 Mar 2008 06:39:38 -0700

I'm not sure about a way to boost scores in this case, but you canachieve the basic matching by applying a filter to the index and thequeries. The ISOLatin1Accent Filter seems like it may work for you,though I'm not entirely certain if that will cover all the accentcharacters you need.

My approach has been to write new filters, one to normalize the unicodeinto the "decomposed" version, then one to manually strip out all of the"add-on" characters (with decimal codepoint greater than 256). I don'tknow if this will always work, but it's worked well for me so far.

I would test out adding a <filter class="ISOLatin1AccentFilterFactory"/>to your analyzer. It might do the trick. Once again, with thisapproach I'm not sure how to boost either score, so someone else mayhave better ideas. I'm pretty new to all of this stuff.


Peter

climbingrose wrote:

Hi guys,

I'm running to some problems with accented (UTF-8) language. I'd love to
hear some ideas about how to use Solr with those languages. Basically, I
want to achieve what Google did with UTF-8 language.

My requirements including:
1) Accent insensitive search and proper highlighting:
  For example, we have 2 documents:

  Doc A (title:Lập Trình Viên)
  Doc B (title:Lap Trinh Vien)

  if the user enters "Lập Trình Viên", then Doc B is also matched and "Lập
Trình Viên" is highlighted.
  On the other hand, if the query is "Lap Trinh Vien", Doc A is also
matched.
2) Assign proper scores to accented or non-accented searches:
  if the user enters "Lập Trình Viên", then Doc A should be given higher
score than DOC B.
  if the query is "Lap Trinh Vien", Doc A should be given higher score.

Any ideas guys? Thanks in advance!

Re: Accented search

Reply via email to