[
https://issues.apache.org/jira/browse/TIKA-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604903#comment-17604903
]
Tim Allison commented on TIKA-3850:
-----------------------------------
I concur with Nick. For kicks, I ran this with our OpenNLPDetector, and it
returned 'spa' as the most likely.
> Spanish text is incorrectly detected as Galician
> ------------------------------------------------
>
> Key: TIKA-3850
> URL: https://issues.apache.org/jira/browse/TIKA-3850
> Project: Tika
> Issue Type: Bug
> Components: languageidentifier
> Affects Versions: 2.4.1
> Environment: org.apache.tika:tika-langdetect-optimaize:2.4.1
> org.apache.tika:tika-core:2.4.1
> Reporter: Lenne Hendrickx
> Priority: Minor
>
> The following Spanish text is incorrectly detected as Galician.
> {noformat}
> Hola! Donde puedo contactar para una garantÃa?{noformat}
> The es and gl models are loaded into the language detector.
> Language result:
> {noformat}
> language: gl
> score: 0.999995{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)