Can we see the actual PDF? --Mark Storer Senior Software Engineer Cardiff.com import legalese.Disclaimer; Disclaimer<Cardiff> DisCard = null; Autonomy Corp., an HP Company
> -----Original Message----- > From: Kevin Day [mailto:[email protected]] > Sent: Thursday, October 27, 2011 3:57 PM > To: [email protected] > Subject: [iText-questions] Content stream question > > I have an existing PDF that I'm trying to parse text out of, > and am winding up with a null pointer exception when reading > an array in the content stream. > > I have narrowed the problem down to a particular line in the > content stream (if I run this one line through > PdfContentParser.parse() it fails): > > Here is the line (sorry this is so ugly - I'll describe the > exact location of the problem in a second): > > [(*)-15(*)-15(&+,)(*)-15(-./0,123/45)(*)-15(/.)(*)-15(/2+,.)(* )-15(346/.7823/4)(*)-15(9,4,.82,:)(*)-15(;<)(*)-15(&+,)(*)-15(!> ==>52.823/4)(*)-15(?,4,.82/.)(*)-20(.,98.:349)(*)-20(2+,)(*)-2 0(=3@,=3> +//:)(*)-20(/6)(*)-20(A8.3/>5)(*)-20(34A,527,42)(*)-20(/>)[(*) > -15(*)-15(&+,)(*)-15(-./0,123/45)(*)-15(/.)(*)-15(/2+,.)(*)-15 (346/.7823/4)(*)-15(9,4,.82,:)(*)-15(;<)(*)-15(&+,)(*)-15(!> ==>52.823/4)(*)-15(?,4,.82/.)(*)-20(.,98.:349)(*)-20(2+,)(*)-2 0(=3@,=3> +//:)(*)-20(/6)(*)-20(A8.3/>5)(*)-20(34A,527,42)(*)-20(/>)(21/ > 7,5)(*)-20(8.,)(*)-20(+<-/2+,2318=)(*)-20(34*)] > TJ > > > The problem is that there appears to be an open bracket [ in > the middle of this line. If you search for -20(/>)[(*)-15 > the problem is that open bracket. This makes the parser > think it's reading an array inside the array. The ending ] > then closes the inner array, and the whole thing blows up. > > At first blush, this looks like it's just a bad PDF. But the > trick is that Acrobat parses and renders this thing just fine. > > So my question is: Is it possible that the above is actually > valid per the PDF spec, and we are just missing something > with the tokeniser or parser? > It wouldn't seem like it would valid. But if that were the > case, you'd really think that Acrobat wouldn't be able to > parse it, either. > > Are we missing something in our parser, or is Acrobat doing > some sort of intense logic to reconstruct the Tj operation if > the array doesn't terminate properly? I've done some > thinking on this, and I see no reasonable strategy for > determining where in the content stream to insert an artificial ] > > > -- > View this message in context: > http://itext-general.2136553.n4.nabble.com/Content-stream-ques tion-tp3946312p3946312.html > Sent from the iText - General mailing list archive at Nabble.com. > > -------------------------------------------------------------- > ---------------- > The demand for IT networking professionals continues to grow, > and the demand for specialized networking skills is growing > even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > iText-questions mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/itext-questions > > iText(R) is a registered trademark of 1T3XT BVBA. > Many questions posted to this list can (and will) be answered > with a reference to the iText book: > http://www.itextpdf.com/book/ Please check the keywords list > before you ask for examples: http://itextpdf.com/themes/keywords.php > > ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
