I have a case where I have to extract domain part from emails that are 
found in a text. I used uax_url_email tokenizer to create emails as a 
single. And I have a pattern_capture filter which will emit "@(.+)" pattern 
string. But uax_url_email also return words also which is not an email and 
the pattern capture filter does not filter that. Any suggestions?

"custom_analyzer":{
     "tokenizer": "uax_url_email",
      "filter": [
           "email_domain_filter"
       ]
}
"filter": {
      "email_domain_filter":{
               "type": "pattern_capture",
               "preserve_original": false,
                "patterns": [
                          "@(.+)"
                  ]
       }
}

*input string* : "my email id is [email protected]"
*Output tokens:*  my, email, id, is, gmail.com

But I need only gmail.com

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3de51758-bb99-46c6-b47c-a68004de8eb8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to