[jira] [Commented] (TIKA-3515) Tika CLI -t should use UTF-8 as default output encoding

Hudson (Jira) Wed, 11 Aug 2021 10:03:07 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397490#comment-17397490
 ]


Hudson commented on TIKA-3515:
------------------------------

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #305 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/305/])
TIKA-3515 -- Tika CLI -t should use UTF-8 as default output encoding (tallison: 
[https://github.com/apache/tika/commit/c792036e618f71fca851fd2ec90e8d23aaffd3d5])
* (edit) tika-core/src/main/java/org/apache/tika/sax/ToTextContentHandler.java
* (edit) tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
* (edit) 
tika-parsers/tika-parsers-ml/tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JInceptionV3Net.java
* (edit) 
tika-core/src/test/java/org/apache/tika/sax/RichTextContentHandlerTest.java
* (edit) 
tika-parsers/tika-parsers-ml/tika-parser-nlp-module/src/test/java/org/apache/tika/parser/ner/NamedEntityParserTest.java
* (edit) CHANGES.txt
* (edit) tika-core/src/main/java/org/apache/tika/sax/WriteOutContentHandler.java
* (edit) 
tika-parsers/tika-parsers-ml/tika-age-recogniser/src/test/java/org/apache/tika/parser/recognition/AgeRecogniserTest.java
* (edit) 
tika-parsers/tika-parsers-ml/tika-parser-nlp-module/src/test/java/org/apache/tika/parser/sentiment/SentimentAnalysisParserTest.java


> Tika CLI -t should use UTF-8 as default output encoding
> -------------------------------------------------------
>
>                 Key: TIKA-3515
>                 URL: https://issues.apache.org/jira/browse/TIKA-3515
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 2.0.0, 1.27
>         Environment: Windows 10, Liberica OpenJDK FULL x64 1.8.0_302
>            Reporter: Luís Filipe Nassif
>            Assignee: Tim Allison
>            Priority: Minor
>             Fix For: 2.0.1
>
>         Attachments: Korean lessons_ Lesson 2 – Learnkorean.com.pdf, 
> LIVE-Seoul-ntfs-utf-16-be.txt, LIVE-Seoul-ntfs-utf-16-le.txt, 
> LIVE-Seoul-ntfs-utf-8.txt, LIVE-Seoul-ntfs-utf-8.txt_-x_output.xml, 
> LIVE-Seoul-ntfs-utf-8_-t_output.txt, Screen Shot 2021-08-06 at 5.50.04 
> PM.png, Screen Shot 2021-08-06 at 5.50.21 PM.png, Screen Shot tika-app.png, 
> image-2021-08-09-14-37-30-552.png, image-2021-08-09-14-38-26-763.png
>
>
> Some Korean chars are extracted as squares. The encodings of plain texts are 
> detected correctly. Maybe this is related with the content handler (just a 
> guess). I'll attach the triggering files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (TIKA-3515) Tika CLI -t should use UTF-8 as default output encoding

Reply via email to