[jira] [Commented] (PDFBOX-3174) Get errormessage "FlateFilter: stop reading corrupt stream due to a DataFormatException" by extracting text from pdf-file

Josef Sigritz (JIRA) Tue, 29 Dec 2015 09:23:14 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074086#comment-15074086
 ]


Josef Sigritz commented on PDFBOX-3174:
---------------------------------------

code sample for extracting text from pdf-fiie:

private void checkPdfErrors(byte[] pdfdata) {
        PDFParser parser;
        String parsedText=""; //$NON-NLS-1$
        PDFTextStripper pdfStripper;
        PDDocument pdDoc = null;
        COSDocument cosDoc = null;
        try {
            parser = new PDFParser(new ByteArrayInputStream(pdfdata));
            parser.parse();
            cosDoc = parser.getDocument();
            pdfStripper = new PDFTextStripper();
            pdDoc = new PDDocument(cosDoc);
            parsedText = pdfStripper.getText(pdDoc);
                
            // internal checking of extracted text
                               
        } catch (Exception e) {

        }        
}

> Get errormessage "FlateFilter: stop reading corrupt stream due to a 
> DataFormatException" by extracting text from pdf-file
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-3174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.10
>         Environment: Windows 7/64bit, Java 1.7_67
>            Reporter: Josef Sigritz
>         Attachments: out1.pdf
>
>
> we generate pdf-Files from xml by transformation to fo and converting it with 
> Antennahouse to pdf. We wants to check correct hyphenation in pdf, therefore 
> we extract the text from the pdf. Sometimes we get errormessage  
> "FlateFilter: stop reading corrupt stream due to a DataFormatException" by 
> extracting the text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-3174) Get errormessage "FlateFilter: stop reading corrupt stream due to a DataFormatException" by extracting text from pdf-file

Reply via email to