Re: How to train a Tokenizer for emails ?

Damiano Porta Sat, 10 Sep 2016 14:50:26 -0700

ok, thanks!

2016-09-10 23:46 GMT+02:00 William Colen <[email protected]>:


> When I need I debug the code. I don't know if there is a better way.
>
>
> 2016-09-10 18:24 GMT-03:00 Damiano Porta <[email protected]>:
>
> > Hi WIlliam!
> > Yeah i will go with custom generator that add specific features to this
> > patterns (email, telephone, dates) etc etc.
> > Out of curiosity, how can i get the list of features of a specific token
> ?
> > Thanks!
> > Damiano
> >
> >
> > 2016-09-08 1:46 GMT+02:00 William Colen <[email protected]>:
> >
> > > Have you trained with enough examples of emails?
> > > Some tools have a sequence validator, but I think the tokenizator don't
> > > have. If there was, you could create one that would recognize this.
> > > Another option would be to customize the feature generator to add a
> > special
> > > feature when the token looks like an email or telephone.
> > >
> > >
> > > Regards
> > > William
> > >
> > >
> > > Em segunda-feira, 29 de agosto de 2016, Damiano Porta <
> > > [email protected]> escreveu:
> > >
> > > > Hello,
> > > > I am creating a custom tokenizer. It works pretty well but i have
> > > problems
> > > > with emails.
> > > > The emails can have _ - . that are tokenized in normal text, so the
> > > > question is, how can i train it better?
> > > > After the tokenization I need to apply different regexes to extract
> > > > email/dates/telephones so i must not tokenized such patterns.
> > > >
> > > > Thanks
> > > > Damiano
> > > >
> > >
> > >
> > > --
> > > William Colen
> > >
> >
>

Re: How to train a Tokenizer for emails ?

Reply via email to