[jira] [Commented] (OPENNLP-788) Add a language detection component

Joern Kottmann (JIRA) Thu, 10 Sep 2015 02:09:07 -0700

    [ 
https://issues.apache.org/jira/browse/OPENNLP-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738452#comment-14738452
 ]


Joern Kottmann commented on OPENNLP-788:
----------------------------------------

Maybe the following interface would be suitable:
public interface LanguageDetector {
  Language[] detectLanguage(CharSequence content);
  Set<String> getSupportedLanguages();
  String getLanguageCoding();
}

The doccat component can already do language detection with a custom factory. 
Maybe we can find a way to build a language detector based on the doccat work. 
This would avoid quite some code duplication.

> Add a language detection component
> ----------------------------------
>
>                 Key: OPENNLP-788
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-788
>             Project: OpenNLP
>          Issue Type: Improvement
>            Reporter: Joern Kottmann
>
> Many of the components in OpenNLP are sensitive to the input language. It 
> would be nice if OpenNLP would have a component to detect the language of an 
> input text.
> Two commonly used solutions today are:
> Apache Tikas Language Identifier
> Language Detection from Shuyo, Nakatani



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OPENNLP-788) Add a language detection component

Reply via email to