Hi Tim,
I'll check those pdfs too. Thanks!
Andreas
Am 31.07.2014 14:59, schrieb Allison, Timothy B.:
Hi Andreas,
Not sure if these types of xref issues are what you mean, but this is what we
get on the Tika test PDFS (available here:
http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/):
Now testing: testComment.pdf
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229
Now testing: testOptionalHyphen.pdf
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785
Now testing: testPageNumber.pdf
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851
Now testing: testPDFTwoTextBoxes.pdf
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931
Now testing: testPDFVarious.pdf
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317
Now testing: testPDF_acroform3.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 26441
Now testing: testPDF_childAttachments.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 2314576
Now testing: testPDF_protected.pdf
INFO [main] (PDFParser.java:248) - Document is encrypted
Now testing: testPDF_twoAuthors.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 12324
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 5969
Now testing: testPDF_Version.10.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 5500
Now testing: testPDF_Version.6.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 5592
Now testing: testPDF_Version.7.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 5592
Now testing: testPDF_Version.8.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 5592
Now testing: testPDF_Version.9.x.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 5687
Now testing: testPopupAnnotation.pdf
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 116
ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at
offset 8777
-----Original Message-----
From: Andreas Lehmkuehler [mailto:[email protected]]
Sent: Wednesday, July 30, 2014 3:54 PM
To: [email protected]
Subject: Re: Broken XRef-links, looking for some sample pdfs
Thanks Tilman for the fast response and of course the pointers!
Andreas
Am 30.07.2014 21:14, schrieb Tilman Hausherr:
http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/
file 24, 024064.pdf
file 26, 026779.pdf
file 27, 027266.pdf, 027613.pdf
file 28, 048872.pdf
file 59, 059849.pdf
Additionally, there are the JIRA issues opened by William Palmer; and Tim
Allison had a long test once with a csv result file that had offset problems.
Don't remember the jira issue.
Tilman
Am 30.07.2014 20:59, schrieb Andreas Lehmkuehler:
Hi,
I'm working on an advanced self healing mechanism for wrong xref offset
values. I thought that I've enough sample pdfs but I can't find any.
Can anybody give me a pointer where to find some?
Thanks in advance!
BR
Andreas Lehmkühler