[jira] [Commented] (OPENNLP-216) Add Detokenizer API section

ASF GitHub Bot (Jira) Wed, 16 Dec 2020 23:20:06 -0800


    [ 
https://issues.apache.org/jira/browse/OPENNLP-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250853#comment-17250853
 ]


ASF GitHub Bot commented on OPENNLP-216:
----------------------------------------

kinow commented on a change in pull request #388:
URL: https://github.com/apache/opennlp/pull/388#discussion_r544862453



##########
File path: opennlp-docs/src/docbkx/tokenizer.xml
##########
@@ -396,19 +396,78 @@ test -> NO_OPERATION
                        <![CDATA[
 He said "This is a test".]]>           
                </programlisting>
-               TODO: Add documentation about the dictionary format and how to 
use the API. Contributions are welcome.
                </para>
                <section id="tools.tokenizer.detokenizing.api">
                        <title>Detokenizing API</title>
-                       <para>TODO: Write documentation about the detokenizer 
api. Any contributions
-are very welcome. If you want to contribute please contact us on the mailing 
list
-or comment on the jira issue <ulink 
url="https://issues.apache.org/jira/browse/OPENNLP-216";>OPENNLP-216</ulink>.</para>
+                       <para>
+                               The Detokenizer can be use to detokenize the 
tokens to String.
+                               To instantiate the Detokenizer (a rule based 
detokenizer)
+                               a DetokenizationDictionary (the rule of 
dictionary) must be created first.
+                               The following code sample shows how a rule 
dictionary can be loaded.
+                               <programlisting language="java">
+                                       <![CDATA[
+InputStream dictIn = new FileInputStream("latin-detokenizer.xml");
+DetokenizationDictionary dict = new DetokenizationDictionary(dictIn);]]>
+                               </programlisting>
+                               After the rule dictionary is loadeed the 
DictionaryDetokenizer can be instantiated.
+                               <programlisting language="java">
+                                       <![CDATA[
+Detokenizer detokenizer = new DictionaryDetokenizer(dict);]]>
+                               </programlisting>
+                               The detokenizer offers two detokenize 
methods,the first detokenize the input tokens into a String.
+                               <programlisting language="java">
+                                       <![CDATA[
+String[] tokens = new String[]{"A", "co", "-", "worker", "helped", "."};
+String sentence = detokenizer.detokenize(tokens, null);
+Assert.assertEquals("A co-worker helped.", sentence);]]>
+                               </programlisting>
+                               Tokens which are connected without a space 
inbetween can be spearated by a split marker.

Review comment:
       s/inbetween/in-between?

##########
File path: opennlp-docs/src/docbkx/tokenizer.xml
##########
@@ -396,19 +396,78 @@ test -> NO_OPERATION
                        <![CDATA[
 He said "This is a test".]]>           
                </programlisting>
-               TODO: Add documentation about the dictionary format and how to 
use the API. Contributions are welcome.
                </para>
                <section id="tools.tokenizer.detokenizing.api">
                        <title>Detokenizing API</title>
-                       <para>TODO: Write documentation about the detokenizer 
api. Any contributions
-are very welcome. If you want to contribute please contact us on the mailing 
list
-or comment on the jira issue <ulink 
url="https://issues.apache.org/jira/browse/OPENNLP-216";>OPENNLP-216</ulink>.</para>
+                       <para>
+                               The Detokenizer can be use to detokenize the 
tokens to String.

Review comment:
       s/The Detokenizer can be use/The Detokenizer can be used/




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Add Detokenizer API section
> ---------------------------
>
>                 Key: OPENNLP-216
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-216
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Jörn Kottmann
>            Priority: Major
>              Labels: help-wanted
>
> The documentation is lacking a section about the detokenizer API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (OPENNLP-216) Add Detokenizer API section

Reply via email to