Re: Problem to parse a PDF document

pierre Thu, 14 Jun 2012 01:07:49 -0700

Many thanks, I have attached the file to the issue.

Now it work fine for this kind of documents, but I have a side effect
on other documents, who works fine in the past.


I receive the following error message.

Caused by: java.io.IOException: Error: Expected an integer type,
actual='xref'
        at
org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1541)
        at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseXrefObjStream(NonSequentialPDFParser.java:354)
        at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:266)
        at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:574)
        at
org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1124)
        at
org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1107)

If I use the PDDocument.load() method I receive this warning message :

14 juin 2012 09:58:30 org.apache.pdfbox.pdfparser.XrefTrailerResolver
setStartxref
ATTENTION: Did not found XRef object at specified startxref position
173

but the document is correctly loaded by PDFBox.

I have a problemn for the sample file, because it contains some
confidential datas in it.

Best regards

Pierre Huttin



On Thu, 14 Jun 2012 00:23:49 +0200, Timo Boehme
<[email protected]> wrote:
> Am 13.06.2012 14:02, schrieb [email protected]:
>> Sorry,
>>
>> apparently the pdf was not correctly attached to the previous mail, I
>> just zip it and re-attach it.
>>
>> Pierre Huttin
> 
> With resolving PDFBOX-1099
> (https://issues.apache.org/jira/browse/PDFBOX-1099) the page count is
> correct with both parsers (NonSequentialPDFParser and PDFParser).
> 
> For testing purposes it would be helpful to have your example PDF
> associated with PDFBOX-1099. Could you upload it to this issue (and
> tick the 'Grant license to ASF for inclusion in ASF works (as per the
> Apache License §5)' or give permission to do so with your file
> attached to previous email with license grant?
> 
> 
> Best regards,
> Timo
> 
>>
>> On Wed, 13 Jun 2012 13:56:50 +0200,<[email protected]>  wrote:
>>> Hello,
>>>
>>> I have some trouble with documents the library is not not able to
>>> retreive the number of pages and load them into the list using
>>> PDDocument.getDocumentCatalog().getAllPages() method.
>>>
>>> The pdf file and the java code to retreive the number of pages are
>>> attached to this mail. apparently it's look like the PDFParser do not
>>> read correctly the /Pages object the ref of pages are "8 0" and "19
>>> 0".
>>>
>>> I open the document correctly with adobe reader and itextrups, both
>>> retrieve the correct number of pages : 2.
>>>
>>> I try to run my code using the version 1.7.0 of PDFBox
>>>
>>> Thanks in advance for your help.
>>>
>>> Best regards
>>>
>>> Pierre Huttin

Re: Problem to parse a PDF document

Reply via email to