[ 
https://issues.apache.org/jira/browse/PDFBOX-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir updated PDFBOX-872:
----------------------------

       Priority: Critical  (was: Major)
    Description: 
This report: 
http://www2.goldmansachs.com/our-firm/press/press-releases/current/pdfs/2010-q2-earnings.pdf

With this code:
public static String getTransformed(InputStream inputStream) {
        PDDocument pdDocument = null;
        String document = null;
        try {
            PDFParser parser = new PDFParser(inputStream);
            parser.parse();

            pdDocument = parser.getPDDocument();

            PDFText2HTML pdf2html = new PDFText2HTML("UTF-8");
            document = pdf2html.getText(pdDocument);
        } catch (IOException e) {
            e.printStackTrace();      
        } finally {
            if (pdDocument != null) {
                try {
                    pdDocument.getDocument().close();
                } catch (IOException e) {
                    e.printStackTrace();
                      }
            }
        }

        return document;
    }


returns:
17:01:15,609 [main] ERROR org.apache.pdfbox.filter.FlateFilter  - Stop reading 
corrupt stream
null
java.io.IOException: Error: Expected an integer type, actual=''
        at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
        at 
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:81)
        at 
org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:449)
        at 
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1112)
        at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:591)
        at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:246)
        at 
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:184)

in Foxit PDF this file was opened normally

  was:
This report: 
http://www2.goldmansachs.com/our-firm/press/press-releases/current/pdfs/2010-q2-earnings.pdf

With this code:
public static String getTransformed(InputStream inputStream) {
        PDDocument pdDocument = null;
        String document = null;
        try {
            PDFParser parser = new PDFParser(inputStream);
            parser.parse();

            pdDocument = parser.getPDDocument();

            PDFText2HTML pdf2html = new PDFText2HTML("UTF-8");
            document = pdf2html.getText(pdDocument);
        } catch (IOException e) {
            e.printStackTrace();      
        } finally {
            if (pdDocument != null) {
                try {
                    pdDocument.getDocument().close();
                } catch (IOException e) {
                    e.printStackTrace();
                      }
            }
        }

        return document;
    }


returns:
17:01:15,609 [main] ERROR org.apache.pdfbox.filter.FlateFilter  - Stop reading 
corrupt stream
null
java.io.IOException: Error: Expected an integer type, actual=''
        at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
        at 
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:81)
        at 
org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:449)
        at 
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1112)
        at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:591)
        at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:246)
        at 
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:184)
        at 
com.selerityfinancial.wwwscraper.utils.PDFUtil.getTransformed(PDFUtil.java:25)
        at com.selerityfinancial.wwwscraper.utils.PDFUtil.main(PDFUtil.java:55)


in Foxit PDF this file was opened normally


> ERROR org.apache.pdfbox.filter.FlateFilter  - Stop reading corrupt stream
> -------------------------------------------------------------------------
>
>                 Key: PDFBOX-872
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-872
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.3.1
>         Environment: Windows XP [Версия 5.1.2600]
> java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
> Java HotSpot(TM) Client VM (build 17.1-b03, mixed mode, sharing)
>            Reporter: Vladimir
>            Priority: Critical
>
> This report: 
> http://www2.goldmansachs.com/our-firm/press/press-releases/current/pdfs/2010-q2-earnings.pdf
> With this code:
> public static String getTransformed(InputStream inputStream) {
>         PDDocument pdDocument = null;
>         String document = null;
>         try {
>             PDFParser parser = new PDFParser(inputStream);
>             parser.parse();
>             pdDocument = parser.getPDDocument();
>             PDFText2HTML pdf2html = new PDFText2HTML("UTF-8");
>             document = pdf2html.getText(pdDocument);
>         } catch (IOException e) {
>             e.printStackTrace();      
>         } finally {
>             if (pdDocument != null) {
>                 try {
>                     pdDocument.getDocument().close();
>                 } catch (IOException e) {
>                     e.printStackTrace();
>                       }
>             }
>         }
>         return document;
>     }
> returns:
> 17:01:15,609 [main] ERROR org.apache.pdfbox.filter.FlateFilter  - Stop 
> reading corrupt stream
> null
> java.io.IOException: Error: Expected an integer type, actual=''
>       at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
>       at 
> org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:81)
>       at 
> org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:449)
>       at 
> org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1112)
>       at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:591)
>       at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:246)
>       at 
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:184)
> in Foxit PDF this file was opened normally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to