ok, thanks! 2016-09-10 23:46 GMT+02:00 William Colen <[email protected]>:
> When I need I debug the code. I don't know if there is a better way. > > > 2016-09-10 18:24 GMT-03:00 Damiano Porta <[email protected]>: > > > Hi WIlliam! > > Yeah i will go with custom generator that add specific features to this > > patterns (email, telephone, dates) etc etc. > > Out of curiosity, how can i get the list of features of a specific token > ? > > Thanks! > > Damiano > > > > > > 2016-09-08 1:46 GMT+02:00 William Colen <[email protected]>: > > > > > Have you trained with enough examples of emails? > > > Some tools have a sequence validator, but I think the tokenizator don't > > > have. If there was, you could create one that would recognize this. > > > Another option would be to customize the feature generator to add a > > special > > > feature when the token looks like an email or telephone. > > > > > > > > > Regards > > > William > > > > > > > > > Em segunda-feira, 29 de agosto de 2016, Damiano Porta < > > > [email protected]> escreveu: > > > > > > > Hello, > > > > I am creating a custom tokenizer. It works pretty well but i have > > > problems > > > > with emails. > > > > The emails can have _ - . that are tokenized in normal text, so the > > > > question is, how can i train it better? > > > > After the tokenization I need to apply different regexes to extract > > > > email/dates/telephones so i must not tokenized such patterns. > > > > > > > > Thanks > > > > Damiano > > > > > > > > > > > > > -- > > > William Colen > > > > > >
