Hi Ankush,

A few weeks ago I released an ElasticSearch plugin that allows you to 
override the default word boundary properties for Unicode characters as 
implemented by the StandardTokenizer algorithm. I had the same issue where 
I wanted to use the StandardTokenizer but override the word boundary 
properties for special characters like '#', '@', etc. (for example, treat 
them the same way as the '_' , which is categorized as an extended 
num-letter)

Plugin: https://github.com/bbguitar77/elasticsearch-analysis-standardext

I hope this helps solve your issue.

Thanks
Bryan

On Monday, September 22, 2014 12:19:10 PM UTC-4, Ankush Jhalani wrote:
>
> just checking back if anyone has any ideas.. thanks!
>
> On Friday, September 19, 2014 11:05:59 AM UTC-4, Ankush Jhalani wrote:
>>
>> In our search we have configured text with 2 analyzers, english and 
>> standard so we can match phrases on the standard-analyzer. We break the 
>> keywords by space, and create a bool query for each word. 
>>
>> This is working fine for all cases except where the query has standard 
>> word-separators like & (ampersand), ; (semi-colon), etc.  As 
>> word-separators are stripped in index by analyzer, searching for them 
>> returns 0 results. Gist. 
>> https://gist.github.com/ajhalani/3def3ea7caec5cd58490
>>
>> I don't want to use a whitespace analyzer because we do actually want to 
>> ignore word separators. I was thinking about hacky workarounds like 
>> removing all standalone non-alphanumeric characters, or moving them in 
>> "should" instead of default "must" (in case we do have analyzers in future 
>> that are whitespace). 
>>
>> Thanks in advance.
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6af13c45-93e5-4a8e-9520-88fdc14056f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to