[
https://issues.apache.org/jira/browse/OPENNLP-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868054#comment-16868054
]
ASF GitHub Bot commented on OPENNLP-1267:
-----------------------------------------
kottmann commented on issue #357: OPENNLP-1267 -- add a ProbingLanguageDetector
that can stop early.
URL: https://github.com/apache/opennlp/pull/357#issuecomment-503768785
We merge only one commit. So things would need to be squashed either by you
or by GitHub when we merge. I personally prefer it when you could just send us
one commit, but it doesn't matter much anyway because the GH button can do that
for us.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Allow the LanguageDetector to stop before processing the full string
> --------------------------------------------------------------------
>
> Key: OPENNLP-1267
> URL: https://issues.apache.org/jira/browse/OPENNLP-1267
> Project: OpenNLP
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Major
>
> On TIKA-2790, I found that Yalder is stopping after computing character
> ngrams on roughly the first 60 characters. That _likely_ explains its
> impressive speed. Let's make this "stopping short" feature available in
> OpenNLP.
>
> Ideally, the language detector wouldn't copy the full String, it wouldn't
> normalize the full String, and it wouldn't compute ngrams on the full String.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)