Stefan Postema created PDFBOX-2545:
--------------------------------------

             Summary: ExtractText extracts filename and date
                 Key: PDFBOX-2545
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2545
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.8.7
            Reporter: Stefan Postema


When using PDFBox 1.8 (and also a snapshot of 2.0.0), the ExtractText method 
produces text which also contains the original Adobe Indesign filename (and 
also the date and used images).

Command line example:
java -jar pdfbox-app-2.0.0-SNAPSHOT.jar ExtractText 07-ALS-Onvoldoende-eten.pdf 
test.txt

The first lines of this test.txt file are:

VSN_Briefpapier_ontwerp_V03.indd   1 06-04-12   11:02
Wat kan ik doen als het niet lukt om voldoende te eten? ALS en voeding
Drinkvoeding

Which should be without the Filename and date.

When copy/pasting the text using Adobe Reader, the Indesign filename didn't 
show up. Using a CLI tool 'pdftotext' also didn't show up the line with the 
filename.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to