[ 
https://issues.apache.org/jira/browse/PDFBOX-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-1845.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 2.0.0
                   1.8.6
         Assignee: Tilman Hausherr

An encoded stream has this content:

{code}
... 12873 279 12882 2 12977[373 0 R]<</Contents ...
{code}

BaseParser.readStringNumber() reads "12977[373" into its number buffer. So I 
have added '[' as an extra delimiter. I have also improved the error message so 
that it shows what's in the buffer for the next time something like this 
happens.

Re the JPEG2000 problem, this is a duplicate of PDFBOX-1752, see also there for 
a solution: "I got it to work with pdfbox by using the three files from 
http://www.jpedal.org/download/jars/jai.zip and not just their bugfix of 
jai_imageio." But if I remember correctly, other JPEG2000 images failed with 
that solution.

Fixed in rev 1592333 for the trunk and rev 1592334 for the 1.8 branch.

> PDDocument.load() give Error: Expected a long type at offset 1633
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-1845
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1845
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.0, 2.0.0
>         Environment: Windows 8.1
>            Reporter: David KELLER
>            Assignee: Tilman Hausherr
>            Priority: Blocker
>              Labels: JPEG2000, JPG2000
>             Fix For: 1.8.6, 2.0.0
>
>         Attachments: 14 01 2014-2.pdf, 14 01 2014.pdf
>
>
> I run this simple program with the file in attachment (scanned OCR document 
> from Nuance Omnipage 18)
>       public static void main(String[] args)
>       throws Exception {
>               System.out.println("Start SplitFileTest...");
>               String path = 
> "D:\\test\\batch\\scan_manual\\courrier\\david.keller\\";
>               String pdfFile = path + "14 01 2014.pdf";
>               
>               FileInputStream pdfInputStream = new FileInputStream(pdfFile);
>               
>               PDDocument pdDocument = PDDocument.load(pdfInputStream);
>               List<PDPage> pages = 
> pdDocument.getDocumentCatalog().getAllPages();
>               
>               pdfInputStream.close();
>       }
> And with the 1.8.0 version I have this error :
> java.io.IOException: Error: Expected an integer type, actual='12977[373'
>         at 
> org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1622)
>         at 
> org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:100)
>         at 
> org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:604)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1187)
> And I have just builded the 2.0.0 from the last code source and I have this 
> error :
>  java.io.IOException: Error: Expected a long type at offset 1633
>       at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1682)
>       at 
> org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:100)
>       at 
> org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:663)
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1101)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to