[ 
https://issues.apache.org/jira/browse/OPENNLP-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796572#comment-17796572
 ] 

ASF GitHub Bot commented on OPENNLP-1526:
-----------------------------------------

rzo1 commented on code in PR #566:
URL: https://github.com/apache/opennlp/pull/566#discussion_r1426238045


##########
opennlp-tools/lang/es/abb_ES.xml:
##########
@@ -0,0 +1,236 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.
+-->
+
+<dictionary case_sensitive="false">
+  <entry>
+    <token>a.C.</token>
+  </entry>
+  <entry>
+    <token>a. de C.</token>
+  </entry>
+  <entry>
+    <token>a.J.C.</token>
+  </entry>
+  <entry>
+    <token>a. de J.C.</token>
+  </entry>
+  <entry>
+    <token>a. m.</token>
+  </entry>
+  <entry>
+    <token>apdo.</token>
+  </entry>
+  <entry>
+    <token>apdo.</token>
+  </entry>
+  <entry>
+    <token>aprox.</token>
+  </entry>
+  <entry>
+    <token>Av.</token>
+  </entry>
+  <entry>
+    <token>Avda.</token>
+  </entry>
+  <entry>
+    <token>Bs. As.</token>
+  </entry>
+  <entry>
+    <token>c.c.</token>
+  </entry>
+  <entry>
+    <token>cap.</token>
+  </entry>
+  <entry>
+    <token>D.</token>
+  </entry>
+  <entry>
+    <token>Da.</token>
+  </entry>
+  <entry>
+    <token>Dña.</token>
+  </entry>
+  <entry>
+    <token>d.C.</token>
+  </entry>
+  <entry>
+    <token>d. de C.</token>
+  </entry>
+  <entry>
+    <token>d.J.C.</token>
+  </entry>
+  <entry>
+    <token>d. de J.C</token>
+  </entry>
+  <entry>
+    <token>dna.</token>
+  </entry>
+  <entry>
+    <token>EE. UU.</token>
+  </entry>
+  <entry>
+    <token>etc.</token>
+  </entry>
+  <entry>
+    <token>f.c.</token>
+  </entry>
+  <entry>
+    <token>F.C.</token>
+  </entry>
+  <entry>
+    <token>FF. AA.</token>
+  </entry>
+  <entry>
+    <token>Dr.</token>
+  </entry>
+  <entry>
+    <token>Dra.</token>
+  </entry>
+  <entry>
+    <token>Gob.</token>
+  </entry>
+  <entry>
+    <token>Lic.</token>
+  </entry>
+  <entry>
+    <token>Ing.</token>
+  </entry>
+  <entry>
+    <token>Pdte.</token>
+  </entry>
+  <entry>
+    <token>Pdta.</token>
+  </entry>
+  <entry>
+    <token>pág.</token>
+  </entry>
+  <entry>
+    <token>no.</token>

Review Comment:
   I asked myself a similar questions, if we handle abbreviation without dots 
at all (as the previous PR had an assumption related to dots in the code) and 
noticed - while playing with it - that it doesn't matter at all. I am quite 
confident, that it would just tokenize it as `n.°` but confirmation with a 
test-case would be good, I guess.





> Add Spanish abbreviation dictionary
> -----------------------------------
>
>                 Key: OPENNLP-1526
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1526
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector, Tokenizer
>    Affects Versions: 2.3.1
>            Reporter: Martin Wiesner
>            Assignee: Martin Wiesner
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: abb_ES.xml
>
>          Time Spent: 1h
>  Remaining Estimate: 1h
>
> Similar to the addition in OPENNLP-570, an abbreviation dictionary for 
> Spanish sentence detection and tokenisation might be beneficial.
> Aims:
>  - Create and add a new file {{abb_ES.xml}} to _opennlp-tools/lang/es_
>  - Add basic set of test cases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to