Hello,
I am trying to create a suggest search (search results are displayed while the
user is entering the query) for names, but the search should also give results
if the given name just sounds like an indexed name. However a perfect match
should be ranked higher than a similar sounding match.
I looked at the SpellChecker contrib, but this AFAIK cannot handle incomplete
names (edge n-grams).
So I came up with this idea and it would be great if anyone could tell me if
that is sensible or if there is a better way:
I create an analyzer to be run on the full names, which does the following
- lowercase
- build edge n-grams
put these terms in the field (this would handle correctly spelled input)
- run soundex on the n-grams
put there soundexed n-grams in the field as well
The incoming query will then also run through this analyzer with an or-default.
So a correct spelling will match the normal n-grams plus the soundexed n-grams
leading to a good score. A missspelled name would still match the soundexed
n-grams, leading to a somewhat lower score.
My current problem is that I don't know how to duplicate the tokens in the
analyzer so I can add them as normal n-grams and soundexed n-grams. I suppose
the TeeSinkTokenFilter will get me there, but I could not figure out how to add
all tokens back in one stream.
To recap, my questions are: Could this approach work to create a "fuzzy
suggest"? How do I use the TeeSinkTokenFilter to separate and recombine the
tokenstream.
I hope that was clear, thanks for your help!
Kai
Regelung im Bezug auf Paragraph 37a Absatz 4 HGB: WidasConcepts GmbH,
Geschaeftsfuehrer: Thomas Widmann und Christian Kappert,
Gerichtsstand Pforzheim, Registernummer: HRB 511442,
Umsatzsteueridentifikationsnummer: DE205851091
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind nicht
gestattet.
This e-mail may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this e-mail in error)
please
notify the sender immediately and destroy this e-mail.
Any unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.