Re: How to train a Tokenizer for emails ?

William Colen Sat, 10 Sep 2016 14:47:14 -0700

When I need I debug the code. I don't know if there is a better way.


2016-09-10 18:24 GMT-03:00 Damiano Porta <[email protected]>:

> Hi WIlliam!
> Yeah i will go with custom generator that add specific features to this
> patterns (email, telephone, dates) etc etc.
> Out of curiosity, how can i get the list of features of a specific token ?
> Thanks!
> Damiano
>
>
> 2016-09-08 1:46 GMT+02:00 William Colen <[email protected]>:
>
> > Have you trained with enough examples of emails?
> > Some tools have a sequence validator, but I think the tokenizator don't
> > have. If there was, you could create one that would recognize this.
> > Another option would be to customize the feature generator to add a
> special
> > feature when the token looks like an email or telephone.
> >
> >
> > Regards
> > William
> >
> >
> > Em segunda-feira, 29 de agosto de 2016, Damiano Porta <
> > [email protected]> escreveu:
> >
> > > Hello,
> > > I am creating a custom tokenizer. It works pretty well but i have
> > problems
> > > with emails.
> > > The emails can have _ - . that are tokenized in normal text, so the
> > > question is, how can i train it better?
> > > After the tokenization I need to apply different regexes to extract
> > > email/dates/telephones so i must not tokenized such patterns.
> > >
> > > Thanks
> > > Damiano
> > >
> >
> >
> > --
> > William Colen
> >
>

Re: How to train a Tokenizer for emails ?

Reply via email to