[ 
https://issues.apache.org/jira/browse/OPENNLP-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262787#comment-16262787
 ] 

ASF GitHub Bot commented on OPENNLP-1155:
-----------------------------------------

kottmann commented on a change in pull request #290: OPENNLP-1155: Remove 
deprecated leipzig doccat format support
URL: https://github.com/apache/opennlp/pull/290#discussion_r152596285
 
 

 ##########
 File path: 
opennlp-tools/src/test/java/opennlp/tools/eval/SourceForgeModelEval.java
 ##########
 @@ -80,31 +88,105 @@
  */
 public class SourceForgeModelEval extends AbstractEvalTest {
 
+  private static class LeipzigTestSample {
+    private final List<String> text;
+
+    private LeipzigTestSample(String[] text) {
+      Objects.requireNonNull(text, "text must not be null");
+      this.text = Collections.unmodifiableList(new 
ArrayList<>(Arrays.asList(text)));
+    }
+
+    public String[] getText() {
+      return text.toArray(new String[text.size()]);
+    }
+
+    @Override
+    public String toString() {
+
+      StringBuilder sampleString = new StringBuilder("eng");
+
+      sampleString.append('\t');
+
+      for (String s : text) {
+        sampleString.append(s).append(' ');
+      }
+
+      if (sampleString.length() > 0) {
+        // remove last space
+        sampleString.setLength(sampleString.length() - 1);
+      }
+
+      return sampleString.toString();
+    }
+  }
+
+  private static class LeipzigTestSampleStream extends 
FilterObjectStream<String, LeipzigTestSample> {
+
+    private final int sentencePerDocument;
+    private final Tokenizer tokenizer;
+
+    private LeipzigTestSampleStream(int sentencePerDocument, Tokenizer 
tokenizer, InputStreamFactory in)
+            throws IOException {
+      super(new PlainTextByLineStream(in, StandardCharsets.UTF_8));
+      this.sentencePerDocument = sentencePerDocument;
+      this.tokenizer = tokenizer;
+      System.setOut(new PrintStream(System.out, true, "UTF-8"));
 
 Review comment:
   Remove this line, no need to set this here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Remove deprecated leipzig doccat format support
> -----------------------------------------------
>
>                 Key: OPENNLP-1155
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1155
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Doccat, Formats
>            Reporter: Joern Kottmann
>            Assignee: Peter Thygesen
>            Priority: Minor
>             Fix For: 1.8.4
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to