[
https://issues.apache.org/jira/browse/PDFBOX-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739722#comment-16739722
]
Tilman Hausherr commented on PDFBOX-4431:
-----------------------------------------
Please use the current PDFBox version and explain exactly what you are doing
(small code, not making us install something), what you expected, and what
happened instead. I attached a text extraction of your document.
> PDFBox recognizes only a few words
> ----------------------------------
>
> Key: PDFBOX-4431
> URL: https://issues.apache.org/jira/browse/PDFBOX-4431
> Project: PDFBox
> Issue Type: Bug
> Components: Documentation, Text extraction
> Environment: OS: Windows 10.
> IDE: Oxygen.3a Release (4.7.3a)
> PDF version: Adobe Acrobat Pro DC - 2019.010.20069.49826
> Reporter: Krutheeka Rajkumar
> Priority: Major
> Attachments: RS13170.pdf, RS13170.txt
>
>
> The code I have posted takes in 5 arguments which include the location to a
> pdf document and a search term. The code is to parse through the PDF document
> and return all the matches to the keyword in the document and return their
> locations depending on the format (last given argument).
> The code for some reason recognizes only a few words and errors on other
> words. I am not sure why this is.
> There seems to be no difference in these words in terms of font, size
> location etc.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]