Hi Tim,

I'll check those pdfs too. Thanks!

Andreas

Am 31.07.2014 14:59, schrieb Allison, Timothy B.:
Hi Andreas,

Not sure if these types of xref issues are what you mean, but this is what we 
get on the Tika test PDFS (available here: 
http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/):


Now testing: testComment.pdf
  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
Now testing: testOptionalHyphen.pdf
  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
Now testing: testPageNumber.pdf
  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
Now testing: testPDFTwoTextBoxes.pdf
  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
Now testing: testPDFVarious.pdf
  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
  WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
Now testing: testPDF_acroform3.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 26441
Now testing: testPDF_childAttachments.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 2314576
Now testing: testPDF_protected.pdf
  INFO [main] (PDFParser.java:248) - Document is encrypted
Now testing: testPDF_twoAuthors.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 12324
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 5969
Now testing: testPDF_Version.10.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 5500
Now testing: testPDF_Version.6.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 5592
Now testing: testPDF_Version.7.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 5592
Now testing: testPDF_Version.8.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 5592
Now testing: testPDF_Version.9.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 5687
Now testing: testPopupAnnotation.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at 
offset 8777

-----Original Message-----
From: Andreas Lehmkuehler [mailto:[email protected]]
Sent: Wednesday, July 30, 2014 3:54 PM
To: [email protected]
Subject: Re: Broken XRef-links, looking for some sample pdfs

Thanks Tilman for the fast response and of course the pointers!

Andreas

Am 30.07.2014 21:14, schrieb Tilman Hausherr:
http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/

file 24, 024064.pdf
file 26, 026779.pdf
file 27, 027266.pdf, 027613.pdf
file 28,  048872.pdf
file 59, 059849.pdf

Additionally, there are the JIRA issues opened by William Palmer; and Tim
Allison had a long test once with a csv result file that had offset problems.
Don't remember the jira issue.

Tilman

Am 30.07.2014 20:59, schrieb Andreas Lehmkuehler:
Hi,

I'm working on an advanced self healing mechanism for wrong xref offset
values. I thought that I've enough sample pdfs but I can't find any.

Can anybody give me a pointer where to find some?

Thanks in advance!

BR
Andreas Lehmkühler



Reply via email to