[jira] [Created] (OPENNLP-697) Tokenizer class is hardcoded in the DocumentSampleStream class.

Praveena B (JIRA) Fri, 16 May 2014 10:40:18 -0700

Praveena B created OPENNLP-697:
----------------------------------

             Summary: Tokenizer class is hardcoded in the DocumentSampleStream 
class. 
                 Key: OPENNLP-697
                 URL: https://issues.apache.org/jira/browse/OPENNLP-697
             Project: OpenNLP
          Issue Type: Bug
          Components: Doccat, Tokenizer
    Affects Versions: 1.6.0
            Reporter: Praveena B



While training the DocumentCategorizerME it is possible to set the type of 
Tokenizer that the categorizer should use.
i,e doccatFactory.setTokenizer(SemicolonTokenizer.INSTANCE); 

But the Tokenizer class is hardcoded to WhitespaceTokenizer in the 
DocumentSampleStream class. 
So it is not possible to modify the default tokenizing behaviour even after 
setting it in the doccatFactory.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (OPENNLP-697) Tokenizer class is hardcoded in the DocumentSampleStream class.

Reply via email to