[
https://issues.apache.org/jira/browse/PDFBOX-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933476#action_12933476
]
Vladimir edited comment on PDFBOX-872 at 11/18/10 12:04 PM:
------------------------------------------------------------
Another report:
http://www.salesforce.com/assets/pdf/investors/Q2FY11_Salesforce_FinancialResults.pdf
Has the same exception:
18:57:22,406 [pool-6-thread-1] ERROR org.apache.pdfbox.filter.FlateFilter -
Stop reading corrupt stream
java.io.IOException: Error: Expected an integer type, actual=''
at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1380)
at
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:97)
at
org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:483)
at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1089)
at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:591)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:309)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:241)
at java.lang.Thread.run(Thread.java:619)
was (Author: vladimir_postrigan):
Another report:
http://www.salesforce.com/assets/pdf/investors/Q2FY11_Salesforce_FinancialResults.pdf
Has the same exception:
18:57:22,406 [pool-6-thread-1] ERROR org.apache.pdfbox.filter.FlateFilter -
Stop reading corrupt stream
java.io.IOException: Error: Expected an integer type, actual=''
at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1380)
at
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:97)
at
org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:483)
at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1089)
at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:591)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:309)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:241)
at
com.selerityfinancial.wwwscraper.utils.PDFUtil.getTransformed(PDFUtil.java:25)
at
com.selerityfinancial.wwwscraper.processor.processors.PdfProcessor.process(PdfProcessor.java:25)
at
com.selerityfinancial.wwwscraper.processor.ProcessorService.process(ProcessorService.java:27)
at
com.selerityfinancial.wwwscraper.job.JobExecutorOneWatch.processor(JobExecutorOneWatch.java:61)
at
com.selerityfinancial.wwwscraper.job.JobExecutorOneWatch.run(JobExecutorOneWatch.java:41)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
> ERROR org.apache.pdfbox.filter.FlateFilter - Stop reading corrupt stream
> -------------------------------------------------------------------------
>
> Key: PDFBOX-872
> URL: https://issues.apache.org/jira/browse/PDFBOX-872
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.3.1
> Environment: Windows XP [Версия 5.1.2600]
> java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
> Java HotSpot(TM) Client VM (build 17.1-b03, mixed mode, sharing)
> Reporter: Vladimir
>
> This report:
> http://www2.goldmansachs.com/our-firm/press/press-releases/current/pdfs/2010-q2-earnings.pdf
> With this code:
> public static String getTransformed(InputStream inputStream) {
> PDDocument pdDocument = null;
> String document = null;
> try {
> PDFParser parser = new PDFParser(inputStream);
> parser.parse();
> pdDocument = parser.getPDDocument();
> PDFText2HTML pdf2html = new PDFText2HTML("UTF-8");
> document = pdf2html.getText(pdDocument);
> } catch (IOException e) {
> e.printStackTrace();
> } finally {
> if (pdDocument != null) {
> try {
> pdDocument.getDocument().close();
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> return document;
> }
> returns:
> 17:01:15,609 [main] ERROR org.apache.pdfbox.filter.FlateFilter - Stop
> reading corrupt stream
> null
> java.io.IOException: Error: Expected an integer type, actual=''
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> at
> org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:81)
> at
> org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:449)
> at
> org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1112)
> at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:591)
> at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:246)
> at
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:184)
> at
> com.selerityfinancial.wwwscraper.utils.PDFUtil.getTransformed(PDFUtil.java:25)
> at com.selerityfinancial.wwwscraper.utils.PDFUtil.main(PDFUtil.java:55)
> in Foxit PDF this file was opened normally
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.