When I need I debug the code. I don't know if there is a better way.
2016-09-10 18:24 GMT-03:00 Damiano Porta <[email protected]>: > Hi WIlliam! > Yeah i will go with custom generator that add specific features to this > patterns (email, telephone, dates) etc etc. > Out of curiosity, how can i get the list of features of a specific token ? > Thanks! > Damiano > > > 2016-09-08 1:46 GMT+02:00 William Colen <[email protected]>: > > > Have you trained with enough examples of emails? > > Some tools have a sequence validator, but I think the tokenizator don't > > have. If there was, you could create one that would recognize this. > > Another option would be to customize the feature generator to add a > special > > feature when the token looks like an email or telephone. > > > > > > Regards > > William > > > > > > Em segunda-feira, 29 de agosto de 2016, Damiano Porta < > > [email protected]> escreveu: > > > > > Hello, > > > I am creating a custom tokenizer. It works pretty well but i have > > problems > > > with emails. > > > The emails can have _ - . that are tokenized in normal text, so the > > > question is, how can i train it better? > > > After the tokenization I need to apply different regexes to extract > > > email/dates/telephones so i must not tokenized such patterns. > > > > > > Thanks > > > Damiano > > > > > > > > > -- > > William Colen > > >
