[ 
https://issues.apache.org/jira/browse/OPENNLP-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259748#comment-16259748
 ] 

ASF GitHub Bot commented on OPENNLP-1157:
-----------------------------------------

kottmann closed pull request #288: OPENNLP-1157: Remove tokenizer from doccat 
trainer cli
URL: https://github.com/apache/opennlp/pull/288
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatTrainerTool.java
 
b/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatTrainerTool.java
index 8ebb5a840..b9c49817e 100644
--- 
a/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatTrainerTool.java
+++ 
b/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatTrainerTool.java
@@ -30,8 +30,6 @@
 import opennlp.tools.doccat.DocumentCategorizerME;
 import opennlp.tools.doccat.DocumentSample;
 import opennlp.tools.doccat.FeatureGenerator;
-import opennlp.tools.tokenize.Tokenizer;
-import opennlp.tools.tokenize.WhitespaceTokenizer;
 import opennlp.tools.util.ext.ExtensionLoader;
 import opennlp.tools.util.model.ModelUtil;
 
@@ -85,13 +83,6 @@ public void run(String format, String[] args) {
     CmdLineUtil.writeModel("document categorizer", modelOutFile, model);
   }
 
-  static Tokenizer createTokenizer(String tokenizer) {
-    if (tokenizer != null) {
-      return ExtensionLoader.instantiateExtension(Tokenizer.class, tokenizer);
-    }
-    return WhitespaceTokenizer.INSTANCE;
-  }
-
   static FeatureGenerator[] createFeatureGenerators(String 
featureGeneratorsNames) {
     if (featureGeneratorsNames == null) {
       return new FeatureGenerator[]{new BagOfWordsFeatureGenerator()};
diff --git 
a/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/TrainingParams.java 
b/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/TrainingParams.java
index 4c4f0df35..cb5a39b27 100644
--- 
a/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/TrainingParams.java
+++ 
b/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/TrainingParams.java
@@ -33,14 +33,8 @@
   @OptionalParameter
   String getFeatureGenerators();
 
-  @ParameterDescription(valueName = "tokenizer",
-      description = "Tokenizer implementation. WhitespaceTokenizer is used if 
not specified.")
-  @OptionalParameter
-  String getTokenizer();
-
   @ParameterDescription(valueName = "factoryName",
       description = "A sub-class of DoccatFactory where to get implementation 
and resources.")
   @OptionalParameter
   String getFactory();
-
 }


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Remove tokenizer param from doccat trainer cli
> ----------------------------------------------
>
>                 Key: OPENNLP-1157
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1157
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Command Line Interface, Doccat
>    Affects Versions: 1.8.3
>            Reporter: Joern Kottmann
>            Assignee: Joern Kottmann
>            Priority: Minor
>             Fix For: 1.8.4
>
>
> The parameter is not used for training after the tokenization support was 
> removed from doccat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to