Hi Ankush, A few weeks ago I released an ElasticSearch plugin that allows you to override the default word boundary properties for Unicode characters as implemented by the StandardTokenizer algorithm. I had the same issue where I wanted to use the StandardTokenizer but override the word boundary properties for special characters like '#', '@', etc. (for example, treat them the same way as the '_' , which is categorized as an extended num-letter)
Plugin: https://github.com/bbguitar77/elasticsearch-analysis-standardext I hope this helps solve your issue. Thanks Bryan On Monday, September 22, 2014 12:19:10 PM UTC-4, Ankush Jhalani wrote: > > just checking back if anyone has any ideas.. thanks! > > On Friday, September 19, 2014 11:05:59 AM UTC-4, Ankush Jhalani wrote: >> >> In our search we have configured text with 2 analyzers, english and >> standard so we can match phrases on the standard-analyzer. We break the >> keywords by space, and create a bool query for each word. >> >> This is working fine for all cases except where the query has standard >> word-separators like & (ampersand), ; (semi-colon), etc. As >> word-separators are stripped in index by analyzer, searching for them >> returns 0 results. Gist. >> https://gist.github.com/ajhalani/3def3ea7caec5cd58490 >> >> I don't want to use a whitespace analyzer because we do actually want to >> ignore word separators. I was thinking about hacky workarounds like >> removing all standalone non-alphanumeric characters, or moving them in >> "should" instead of default "must" (in case we do have analyzers in future >> that are whitespace). >> >> Thanks in advance. >> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6af13c45-93e5-4a8e-9520-88fdc14056f8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
