[
https://issues.apache.org/jira/browse/TIKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-493:
--------------------------------
Assignee: Ken Krugler
> Support for macro languages
> ---------------------------
>
> Key: TIKA-493
> URL: https://issues.apache.org/jira/browse/TIKA-493
> Project: Tika
> Issue Type: New Feature
> Components: languageidentifier
> Affects Versions: 0.7
> Reporter: Jan Høydahl
> Assignee: Ken Krugler
>
> Some languages have variants, and there are ISO codes to identify both the
> variants as well as a code to identify the macro-language. There should be a
> way to tell whether the identified language is part of a "macro language" and
> to return the macro language. This is because different applications require
> different codes. E.g. for search it makes sense to tag the document with both
> the unique code and the macro code.
> Example:
> Norwegian: no
> Norwegian bokmål: nb
> Norwegian nynorsk: nn
> The getLanguage() call should continue to return the most correct and
> specific ISO code (according to which language profile matched).
> In addition, it should be possible to get the macro language.
> Proposed implementation:
> Add some new methods:
> public boolean hasMacroLanguage() // true | false
> public String getMacroLanguage() // In case of "nn" or "nb", result
> would be "no"
> The definition of macro languages can be added in the property file
> introduced in TIKA-490.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.