[
https://issues.apache.org/jira/browse/PDFBOX-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796754#action_12796754
]
Bernard commented on PDFBOX-586:
--------------------------------
Hi,
I have founded something strange.
If I use the jar files ( commons-logging-1.1.1.jar fontbox-0.8.0-incubating.jar
jempbox-1.0.0.jar pdfbox-0.8.0-incubating.jar ) the text extraction is OK for
a specific test file, for the 1st page.
If I use the Source Code of fontbox-0.8.0-incubating.jar jempbox-1.0.0.jar
pdfbox-0.8.0-incubating.jar : the text extraction returns nothing for the 1st
page. I have commented/removed encryption stuff as well as Logs.
All for version 0.8.0
Are you the the sources code are the one which created the .jar ?
(I would need source code, because I use PDFBox on Android devices, and I need
to remove all unnessary stuff : tests, logs, printf, colors management, fonts
managements, ...). Including directly .jar would make a too big executable.
> Text Extraction Regression ?
> ----------------------------
>
> Key: PDFBOX-586
> URL: https://issues.apache.org/jira/browse/PDFBOX-586
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Environment: Windows XP + Eclipse + PDFBox sources
> Reporter: Bernard
> Attachments: linux_programmer_guide.pdf
>
>
> Hi,
> I have noticed that I can extract text some PDF files in PDFBox 0.7.3 but for
> the same file, the same page, PDFBox 0.8.0 doesn't retreive any text.
> The text is in english (ascii char). I have around 40% of my pdf file with
> this problem.
> Am I the only only one who think there is a regression in text extraction ?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.