[
https://issues.apache.org/jira/browse/TIKA-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022375#comment-18022375
]
Tilman Hausherr commented on TIKA-4493:
---------------------------------------
I assume that this is about page 8. You didn't tell how you are using tika, but
here's the configuration of the PDF parser:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=109454066
Try setting detectAngles to true. And update to the latest 3.0 version, unless
you're not affected by the latest CVE.
> Text extracted from PDF appears vertical when using Apache Tika
> ---------------------------------------------------------------
>
> Key: TIKA-4493
> URL: https://issues.apache.org/jira/browse/TIKA-4493
> Project: Tika
> Issue Type: Bug
> Affects Versions: 2.9.0
> Reporter: sai krishna
> Priority: Critical
> Attachments: TIKA-4493-p8.pdf, extracted text.png, ord0120_AW.pdf
>
>
> When extracting text from a PDF file using Apache Tika, the output text is
> rendered vertically instead of the expected horizontal layout. This issue
> occurs consistently with the attached PDF file.
> I have attached the sample PDF and a screenshot of the extracted text for
> reference.
> Please investigate why Tika is not preserving the correct text orientation
> during extraction
--
This message was sent by Atlassian Jira
(v8.20.10#820010)