ElasticSearch standard Analyzer - problematic case

Igor Romanov Thu, 03 Apr 2014 01:53:27 -0700

Hi

I was analyzing some analyzer weird behaviour, and try to understand why it 
happens and how to fix it


here what token I get for standard analyzer for text: 
"[email protected]:test1234"

curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty=true' -d 
'[email protected]:test1234'
{
  "tokens" : [ {
    "token" : "myemail",
    "start_offset" : 0,
    "end_offset" : 7,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "email.com:test1234",
    "start_offset" : 8,
    "end_offset" : 26,
    "type" : "<ALPHANUM>",
    "position" : 2
  } ]
}


so question is why I am getting that as one token: "email.com:test1234"

why it is not devided to tokens by . and : ?

and what analyzer/tokenizer/filter can I use that can help with it?

Thanks,
Igor

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/826eb584-3408-404a-b87c-2c44e455bb65%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ElasticSearch standard Analyzer - problematic case

Reply via email to