Ok so the idea is you store each term twice - once stemmed (+ ascii folded
+ whatever) and once just lowercased, and add a character (we used $) to
mark that term as the "original".

You can see it in action here:
https://github.com/synhershko/elasticsearch-analysis-hebrew/blob/master/src/main/java/com/code972/elasticsearch/analysis/HebrewIndexingAnalyzer.java#L20
(warning: plugin still under work, and is using some non-traditional
methods to do stuff)

There's some details to take into account - like how to search for the
original etc, but if you'll look at the code there you'll get an idea of
how its done

We did that also for non-Hebrew and non-English texts. It works quite
nicely, but it doubles the amount of terms in your index.

--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko>
Freelance Developer & Consultant
Author of RavenDB in Action <http://manning.com/synhershko/>


On Tue, Jan 28, 2014 at 6:09 PM, Nikolas Everett <[email protected]> wrote:

> I'd prefer multiple terms in the same position if I can get away with it.
> That way it'd all be configured by the analyzer so it wouldn't add any
> extra complexity to other languages.  It'd take up much less space that way
> as well.
>
>
> On Tue, Jan 28, 2014 at 11:04 AM, Itamar Syn-Hershko 
> <[email protected]>wrote:
>
>> You will have to use 2 fields, or multiple terms on the same position. In
>> a recent project we found a nice way of dealing with that on the same
>> field, I hope to have a blog post about that soon..
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>> Freelance Developer & Consultant
>> Author of RavenDB in Action <http://manning.com/synhershko/>
>>
>>
>> On Tue, Jan 28, 2014 at 6:00 PM, Nikolas Everett <[email protected]>wrote:
>>
>>>  I'm looking to make asciifolding optional in my (English) index. If
>>> the user searches without any high ascii characters then I want to match
>>> against the folded tokens. If the user searches with high ascii characters
>>> then I only want to match the unfolded tokens.  Is this possible with
>>> Elasticsearch right now?
>>>
>>> Nik
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3w%2BNHJZQkcCRnEKuowAuObkBTVbHEhnCFpkLH7y0Pa0Q%40mail.gmail.com
>>> .
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuXY6wTDZJNEmwrXN8dRESSYJrKSkcHvSC6KkzYp4TLtg%40mail.gmail.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2imi0WzibxZC_KmeK0J139fR6zjB5H0ij1fdLoxvzJzQ%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt2LWsXrQ-fXL49u-azWpGvCqvUh-%2BdN13nYT1SqrOFEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to