Hi WIlliam! Yeah i will go with custom generator that add specific features to this patterns (email, telephone, dates) etc etc. Out of curiosity, how can i get the list of features of a specific token ? Thanks! Damiano
2016-09-08 1:46 GMT+02:00 William Colen <[email protected]>: > Have you trained with enough examples of emails? > Some tools have a sequence validator, but I think the tokenizator don't > have. If there was, you could create one that would recognize this. > Another option would be to customize the feature generator to add a special > feature when the token looks like an email or telephone. > > > Regards > William > > > Em segunda-feira, 29 de agosto de 2016, Damiano Porta < > [email protected]> escreveu: > > > Hello, > > I am creating a custom tokenizer. It works pretty well but i have > problems > > with emails. > > The emails can have _ - . that are tokenized in normal text, so the > > question is, how can i train it better? > > After the tokenization I need to apply different regexes to extract > > email/dates/telephones so i must not tokenized such patterns. > > > > Thanks > > Damiano > > > > > -- > William Colen >
