[ 
https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242989#comment-16242989
 ] 

Steve Rowe commented on SOLR-11592:
-----------------------------------

Hi Koji,

Looks good so far!  In addition to testing, documentation is also needed 
({{detecting-languages-during-indexing.adoc}}).

For IntelliJ with this patch, the langid.iml needs to add a dependency on the 
{{analysis-common}} module:

{noformat}
+    <orderEntry type="module" module-name="analysis-common" />
{noformat}

About your TODO:

{code:java}
        // TODO: not sure *100 is appropriate...
        languages.add(new DetectedLanguage(language.getLang(), 
language.getConfidence() * 100));
{code}

{{DetectedLanguage.getCertainty()}} javadoc says:

{code:java}
  /**
   * Returns the detected certainty for this language
   * @return certainty as a value between 0.0 and 1.0 where 1.0 is 100% certain
   */
{code}

So I think {{*100}} is inappropriate.

> add another language detector using OpenNLP
> -------------------------------------------
>
>                 Key: SOLR-11592
>                 URL: https://issues.apache.org/jira/browse/SOLR-11592
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - LangId
>    Affects Versions: 7.1
>            Reporter: Koji Sekiguchi
>            Priority: Minor
>         Attachments: SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect. 
> This is a ticket that gives users third option using OpenNLP. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to