How to train a Tokenizer for emails ?

Damiano Porta Mon, 29 Aug 2016 06:11:22 -0700

Hello,
I am creating a custom tokenizer. It works pretty well but i have problems
with emails.
The emails can have _ - . that are tokenized in normal text, so the
question is, how can i train it better?
After the tokenization I need to apply different regexes to extract
email/dates/telephones so i must not tokenized such patterns.


Thanks
Damiano

How to train a Tokenizer for emails ?

Reply via email to