[
https://issues.apache.org/jira/browse/TIKA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated TIKA-3620:
---------------------------------------
Fix Version/s: 2.2.0
> Language detection documentation needs attention
> ------------------------------------------------
>
> Key: TIKA-3620
> URL: https://issues.apache.org/jira/browse/TIKA-3620
> Project: Tika
> Issue Type: Improvement
> Components: languageidentifier
> Affects Versions: 2.1.0
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Major
> Fix For: 2.2.0
>
>
> This language identifier/detection suffers from a few problems
> # Clarity is needed on identifier/identification Vs detector/detection. Which
> is it? The source code says identifier whereas the [documentation is nested
> under
> detection|https://tika.apache.org/2.1.0/detection.html#Language_Detection].
> # The
> [org.apache.tika.language.LanguageIdentifier|https://tika.apache.org/2.1.0/api/org/apache/tika/language/LanguageIdentifier.html]
> returns 404. What is this meant to resolve to?
> # Generally speaking the [documentation is literally
> non-existent|https://tika.apache.org/2.1.0/detection.html#Language_Detection].
> I checked the wiki and failed to find anything. I did find some [minor
> documentation|https://tika.apache.org/2.1.0/examples.html#Language_Identification]
> but this is also severely lacking. Also note the broken hyperlink.
> Some suggestions for improvement
> # Fix the broken hyperlinks.
> # Hyperlink to the existing example namely
> [LanguageDetectorExample.java|https://github.com/apache/tika/blob/main/tika-example/src/main/java/org/apache/tika/example/LanguageDetectorExample.java],
>
> [LanguageDetectingParser.java|https://github.com/apache/tika/blob/main/tika-example/src/main/java/org/apache/tika/example/LanguageDetectingParser.java]
> and
> [Language.java|https://github.com/apache/tika/blob/main/tika-example/src/main/java/org/apache/tika/example/Language.java]
> # Hyperlink to the [LanguageDetector
> Javadoc|https://tika.apache.org/2.1.0/api/index.html?org/apache/tika/language/detect/LanguageDetector.html]
> and atleast mention some of the other implementations.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)