[jira] [Commented] (TIKA-3456) LanguageDetector should try to respect hasEnoughText more intelligently

Hudson (Jira) Mon, 28 Jun 2021 14:29:23 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370858#comment-17370858
 ]


Hudson commented on TIKA-3456:
------------------------------

UNSTABLE: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #137 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/137/])
TIKA-3456 -- LanguageDetector should chunk long strings and test for 
hasEnoughText. (tallison: 
[https://github.com/apache/tika/commit/4ba5fd7eb8b1a6ccc45fd773b73e6f809a652370])
* (edit) 
tika-core/src/main/java/org/apache/tika/language/detect/LanguageDetector.java


> LanguageDetector should try to respect hasEnoughText more intelligently
> -----------------------------------------------------------------------
>
>                 Key: TIKA-3456
>                 URL: https://issues.apache.org/jira/browse/TIKA-3456
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 1.27
>
>
> If a user calls LanguageDetector's detect(String txt) or addText(String txt), 
> the full string is passed on to the subclasses and there is no check on 
> "hasEnoughText()".  For large strings, LanguageDetector should break the 
> string into smaller parts and check for hasEnoughText().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (TIKA-3456) LanguageDetector should try to respect hasEnoughText more intelligently

Reply via email to