Re: Folding Repeated Letters

2020-10-09 Thread Walter Underwood
Actually, helping the humans to use proper spelling is a good approach. Include a spelling correction step (non-optional) for user-generated content and spelling suggestions for queries. Completion/suggestion is another way to guide people to properly spelled words that exist in your index. I

Re: Folding Repeated Letters

2020-10-09 Thread Alexandre Rafalovitch
Are there that many of those words.?Because even if you deal with , there is still yas! Maybe you just have regexp synonyms? (ye+s+) Good luck, 413x On Thu., Oct. 8, 2020, 6:02 p.m. Mike Drob, wrote: > I'm looking for a way to transform words with repeated letters into the >

Re: Folding Repeated Letters

2020-10-09 Thread Erick Erickson
Anything you do will be wrong ;). I suppose you could kick out words that weren’t in some dictionary and accumulate a list of words not in the dictionary and just deal with them “somehow", but that’s labor-intensive since you then have to deal with proper names and the like. Sometimes you can

Re: Folding Repeated Letters

2020-10-08 Thread Mike Drob
I was thinking about that, but there are words that are legitimately different with repeated consonants. My primary school teacher lost hair over getting us to learn the difference between desert and dessert. Maybe we need something that can borrow the boosting behaviour of fuzzy query - match

Re: Folding Repeated Letters

2020-10-08 Thread Andy Webb
How about something like this? { "add-field-type": [ { "name": "norepeat", "class": "solr.TextField", "analyzer": { "tokenizer": { "class": "solr.StandardTokenizerFactory" },

Folding Repeated Letters

2020-10-08 Thread Mike Drob
I'm looking for a way to transform words with repeated letters into the same token - does something like this exist out of the box? Do our stemmers support it? For example, say I would want all of these terms to return the same search results: YES YESSS YYYEEESSS YYEE[...]S I don't know how