Ok so the idea is you store each term twice - once stemmed (+ ascii folded + whatever) and once just lowercased, and add a character (we used $) to mark that term as the "original".
You can see it in action here: https://github.com/synhershko/elasticsearch-analysis-hebrew/blob/master/src/main/java/com/code972/elasticsearch/analysis/HebrewIndexingAnalyzer.java#L20 (warning: plugin still under work, and is using some non-traditional methods to do stuff) There's some details to take into account - like how to search for the original etc, but if you'll look at the code there you'll get an idea of how its done We did that also for non-Hebrew and non-English texts. It works quite nicely, but it doubles the amount of terms in your index. -- Itamar Syn-Hershko http://code972.com | @synhershko <https://twitter.com/synhershko> Freelance Developer & Consultant Author of RavenDB in Action <http://manning.com/synhershko/> On Tue, Jan 28, 2014 at 6:09 PM, Nikolas Everett <[email protected]> wrote: > I'd prefer multiple terms in the same position if I can get away with it. > That way it'd all be configured by the analyzer so it wouldn't add any > extra complexity to other languages. It'd take up much less space that way > as well. > > > On Tue, Jan 28, 2014 at 11:04 AM, Itamar Syn-Hershko > <[email protected]>wrote: > >> You will have to use 2 fields, or multiple terms on the same position. In >> a recent project we found a nice way of dealing with that on the same >> field, I hope to have a blog post about that soon.. >> >> -- >> >> Itamar Syn-Hershko >> http://code972.com | @synhershko <https://twitter.com/synhershko> >> Freelance Developer & Consultant >> Author of RavenDB in Action <http://manning.com/synhershko/> >> >> >> On Tue, Jan 28, 2014 at 6:00 PM, Nikolas Everett <[email protected]>wrote: >> >>> I'm looking to make asciifolding optional in my (English) index. If >>> the user searches without any high ascii characters then I want to match >>> against the folded tokens. If the user searches with high ascii characters >>> then I only want to match the unfolded tokens. Is this possible with >>> Elasticsearch right now? >>> >>> Nik >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3w%2BNHJZQkcCRnEKuowAuObkBTVbHEhnCFpkLH7y0Pa0Q%40mail.gmail.com >>> . >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuXY6wTDZJNEmwrXN8dRESSYJrKSkcHvSC6KkzYp4TLtg%40mail.gmail.com >> . >> >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2imi0WzibxZC_KmeK0J139fR6zjB5H0ij1fdLoxvzJzQ%40mail.gmail.com > . > > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt2LWsXrQ-fXL49u-azWpGvCqvUh-%2BdN13nYT1SqrOFEQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
