[jira] Commented: (PDFBOX-586) Text Extraction Regression ?

JIRA Tue, 18 May 2010 12:06:07 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868790#action_12868790
 ]


Andreas Lehmkühler commented on PDFBOX-586:
-------------------------------------------

Works like a charm with 1.1.0. (using ExtractText -sort -encoding utf-8). Find 
my results attached to this issue

What exactly goes wrong when you try to extract the text? Do you get any 
exception? What are the differences between the older 0.7.4 results and those 
produced with the more recent version of pdfbox?

> Text Extraction Regression ?
> ----------------------------
>
>                 Key: PDFBOX-586
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-586
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.1.0
>         Environment: Windows XP + Eclipse + PDFBox sources
>            Reporter: Bernard
>         Attachments: ASEB-Camping_Car_ou_Bateau.pdf, Eval.pdf, internals.pdf, 
> PDFBOX586-ASEB-Camping_Car_ou_Bateau.txt, PDFBOX586-Eval.txt, 
> PDFBOX586-internals.txt
>
>
> Hi,
> I have noticed that I can extract text some PDF files in PDFBox 0.7.4 but for 
> the same file, the same page, PDFBox 1.1.0 doesn't retreive any text, or the 
> extraction is worst.
> Am I the only only one who think there is a regression in text extraction ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PDFBOX-586) Text Extraction Regression ?

Reply via email to