[ 
https://issues.apache.org/jira/browse/TIKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Krugler reassigned TIKA-493:
--------------------------------

    Assignee: Ken Krugler

> Support for macro languages
> ---------------------------
>
>                 Key: TIKA-493
>                 URL: https://issues.apache.org/jira/browse/TIKA-493
>             Project: Tika
>          Issue Type: New Feature
>          Components: languageidentifier
>    Affects Versions: 0.7
>            Reporter: Jan Høydahl
>            Assignee: Ken Krugler
>
> Some languages have variants, and there are ISO codes to identify both the 
> variants as well as a code to identify the macro-language. There should be a 
> way to tell whether the identified language is part of a "macro language" and 
> to return the macro language. This is because different applications require 
> different codes. E.g. for search it makes sense to tag the document with both 
> the unique code and the macro code.
> Example:
> Norwegian: no
> Norwegian bokmål: nb
> Norwegian nynorsk: nn
> The getLanguage() call should continue to return the most correct and 
> specific ISO code (according to which language profile matched).
> In addition, it should be possible to get the macro language.
> Proposed implementation:
> Add some new methods:
> public boolean hasMacroLanguage()    // true | false
> public String getMacroLanguage()         // In case of "nn" or "nb", result 
> would be "no"
> The definition of macro languages can be added in the property file 
> introduced in TIKA-490.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to