[jira] [Commented] (TIKA-4493) Text extracted from PDF appears vertical when using Apache Tika

Tilman Hausherr (Jira) Wed, 24 Sep 2025 02:11:25 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022375#comment-18022375
 ]


Tilman Hausherr commented on TIKA-4493:
---------------------------------------

I assume that this is about page 8. You didn't tell how you are using tika, but 
here's the configuration of the PDF parser:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=109454066

Try setting detectAngles to true. And update to the latest 3.0 version, unless 
you're not affected by the latest CVE.

> Text extracted from PDF appears vertical when using Apache Tika
> ---------------------------------------------------------------
>
>                 Key: TIKA-4493
>                 URL: https://issues.apache.org/jira/browse/TIKA-4493
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.9.0
>            Reporter: sai krishna
>            Priority: Critical
>         Attachments: TIKA-4493-p8.pdf, extracted text.png, ord0120_AW.pdf
>
>
> When extracting text from a PDF file using Apache Tika, the output text is 
> rendered vertically instead of the expected horizontal layout. This issue 
> occurs consistently with the attached PDF file.
> I have attached the sample PDF and a screenshot of the extracted text for 
> reference.
> Please investigate why Tika is not preserving the correct text orientation 
> during extraction



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (TIKA-4493) Text extracted from PDF appears vertical when using Apache Tika

Reply via email to