[jira] [Updated] (PDFBOX-4431) PDFBox recognizes only a few words

Krutheeka Rajkumar (JIRA) Thu, 10 Jan 2019 11:38:24 -0800


     [ 
https://issues.apache.org/jira/browse/PDFBOX-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Krutheeka Rajkumar updated PDFBOX-4431:
---------------------------------------
    External issue URL: https://github.com/leslie-lau/fulltextsearch  (was: 
https://github.com/internetarchive/bookreader)

> PDFBox recognizes only a few words
> ----------------------------------
>
>                 Key: PDFBOX-4431
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4431
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Documentation, Text extraction
>         Environment: OS: Windows 10.
> IDE: Oxygen.3a Release (4.7.3a)
> PDF version: Adobe Acrobat Pro DC - 2019.010.20069.49826
>            Reporter: Krutheeka Rajkumar
>            Priority: Major
>         Attachments: RS13170.pdf
>
>
> The code I have posted takes in 5 arguments which include the location to a 
> pdf document and a search term. The code is to parse through the PDF document 
> and return all the matches to the keyword in the document and return their 
> locations depending on the format (last given argument).
> The code for some reason recognizes only a few words and errors on other 
> words. I am not sure why this is.
> There seems to be no difference in these words in terms of font, size 
> location etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (PDFBOX-4431) PDFBox recognizes only a few words

Reply via email to