[
https://issues.apache.org/jira/browse/OPENNLP-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069039#comment-16069039
]
martin commented on OPENNLP-1099:
---------------------------------
I am reporting this to deeplearning4j, and hopefully they can use the openNLP
code from apache, not soureforge. Thanks.
> Is this a typical tokenization issue?
> -------------------------------------
>
> Key: OPENNLP-1099
> URL: https://issues.apache.org/jira/browse/OPENNLP-1099
> Project: OpenNLP
> Issue Type: Bug
> Components: Lemmatizer
> Reporter: martin
> Fix For: 1.8.1
>
>
> I am testing openNLP and found some significant tokenization issue involving
> punctuation.
> Thank you Costco!
> i love costco!
> I love Costco!!
> FUCK IKEA.
> In all these cases, the last punctuation is not split so "Costco!" and
> "IKEA." are treated as one token. This looks like a systematic problem.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)