Hi WIlliam!
Yeah i will go with custom generator that add specific features to this
patterns (email, telephone, dates) etc etc.
Out of curiosity, how can i get the list of features of a specific token ?
Thanks!
Damiano


2016-09-08 1:46 GMT+02:00 William Colen <[email protected]>:

> Have you trained with enough examples of emails?
> Some tools have a sequence validator, but I think the tokenizator don't
> have. If there was, you could create one that would recognize this.
> Another option would be to customize the feature generator to add a special
> feature when the token looks like an email or telephone.
>
>
> Regards
> William
>
>
> Em segunda-feira, 29 de agosto de 2016, Damiano Porta <
> [email protected]> escreveu:
>
> > Hello,
> > I am creating a custom tokenizer. It works pretty well but i have
> problems
> > with emails.
> > The emails can have _ - . that are tokenized in normal text, so the
> > question is, how can i train it better?
> > After the tokenization I need to apply different regexes to extract
> > email/dates/telephones so i must not tokenized such patterns.
> >
> > Thanks
> > Damiano
> >
>
>
> --
> William Colen
>

Reply via email to