Samved Chandrakant Divekar created PDFBOX-5796: --------------------------------------------------
Summary: PDFBox cannot extract vector text from a PDF Key: PDFBOX-5796 URL: https://issues.apache.org/jira/browse/PDFBOX-5796 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 2.0.28 Environment: MacOS Sonoma 14.4.1 OpenJDK 11 (Can reproduce in other environments too) Reporter: Samved Chandrakant Divekar Attachments: Pre-flght_example.png, Sample_Working.png, Sample_not_Working.png PDFBox does not extract any text in the PDF which has all text encoded as vector objects. Unfortunately, I cannot attach the original document here(confidentiality) but. have attached screenshot of pre-flight analysis of the a working file and a non-working file using Adobe Acrobat pro. I can't copy paste the text directly, however Adobe's "Recognize Text" function works on the document. I verified that the whole page is not an image but definitley all text is encoded as vector objects. I have attached an example of what pre-flight analysis for a letter shows. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org