Hi Andreas, Not sure if these types of xref issues are what you mean, but this is what we get on the Tika test PDFS (available here: http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/):
Now testing: testComment.pdf WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 68229 Now testing: testOptionalHyphen.pdf WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 44785 Now testing: testPageNumber.pdf WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 51851 Now testing: testPDFTwoTextBoxes.pdf WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 56931 Now testing: testPDFVarious.pdf WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317 WARN [main] (PDFParser.java:757) - Count in xref table is 0 at offset 205317 Now testing: testPDF_acroform3.pdf ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116 ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 26441 Now testing: testPDF_childAttachments.pdf ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 2314576 Now testing: testPDF_protected.pdf INFO [main] (PDFParser.java:248) - Document is encrypted Now testing: testPDF_twoAuthors.pdf ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 12324 ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116 ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5969 Now testing: testPDF_Version.10.x.pdf ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116 ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5500 Now testing: testPDF_Version.6.x.pdf ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116 ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592 Now testing: testPDF_Version.7.x.pdf ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116 ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592 Now testing: testPDF_Version.8.x.pdf ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116 ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5592 Now testing: testPDF_Version.9.x.pdf ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116 ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 5687 Now testing: testPopupAnnotation.pdf ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 116 ERROR [main] (NonSequentialPDFParser.java:1904) - Can't find the object xref at offset 8777 -----Original Message----- From: Andreas Lehmkuehler [mailto:[email protected]] Sent: Wednesday, July 30, 2014 3:54 PM To: [email protected] Subject: Re: Broken XRef-links, looking for some sample pdfs Thanks Tilman for the fast response and of course the pointers! Andreas Am 30.07.2014 21:14, schrieb Tilman Hausherr: > http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/ > > file 24, 024064.pdf > file 26, 026779.pdf > file 27, 027266.pdf, 027613.pdf > file 28, 048872.pdf > file 59, 059849.pdf > > Additionally, there are the JIRA issues opened by William Palmer; and Tim > Allison had a long test once with a csv result file that had offset problems. > Don't remember the jira issue. > > Tilman > > Am 30.07.2014 20:59, schrieb Andreas Lehmkuehler: >> Hi, >> >> I'm working on an advanced self healing mechanism for wrong xref offset >> values. I thought that I've enough sample pdfs but I can't find any. >> >> Can anybody give me a pointer where to find some? >> >> Thanks in advance! >> >> BR >> Andreas Lehmkühler >
