[ 
https://issues.apache.org/jira/browse/PDFBOX-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871283#comment-13871283
 ] 

Andreas Lehmkühler commented on PDFBOX-1845:
--------------------------------------------

The second pdf was rearranged and is linearized, so that the issue is similar 
but not identical. Both pdfs are using streams for the XRef tabele and I'm not 
sure if PDFBox has a problem reading those or if these are broken.

We'll have to investigate further

> PDDocument.load() give Error: Expected a long type at offset 1633
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-1845
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1845
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.0, 2.0.0
>         Environment: Windows 8.1
>            Reporter: David KELLER
>            Priority: Blocker
>         Attachments: 14 01 2014-2.pdf, 14 01 2014.pdf
>
>
> I run this simple program with the file in attachment (scanned OCR document 
> from Nuance Omnipage 18)
>       public static void main(String[] args)
>       throws Exception {
>               System.out.println("Start SplitFileTest...");
>               String path = 
> "D:\\test\\batch\\scan_manual\\courrier\\david.keller\\";
>               String pdfFile = path + "14 01 2014.pdf";
>               
>               FileInputStream pdfInputStream = new FileInputStream(pdfFile);
>               
>               PDDocument pdDocument = PDDocument.load(pdfInputStream);
>               List<PDPage> pages = 
> pdDocument.getDocumentCatalog().getAllPages();
>               
>               pdfInputStream.close();
>       }
> And with the 1.8.0 version I have this error :
> java.io.IOException: Error: Expected an integer type, actual='12977[373'
>         at 
> org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1622)
>         at 
> org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:100)
>         at 
> org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:604)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1187)
> And I have just builded the 2.0.0 from the last code source and I have this 
> error :
>  java.io.IOException: Error: Expected a long type at offset 1633
>       at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1682)
>       at 
> org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:100)
>       at 
> org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:663)
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1101)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to