I have an existing PDF that I'm trying to parse text out of, and am winding up with a null pointer exception when reading an array in the content stream.
I have narrowed the problem down to a particular line in the content stream (if I run this one line through PdfContentParser.parse() it fails): Here is the line (sorry this is so ugly - I'll describe the exact location of the problem in a second): [(*)-15(*)-15(&+,)(*)-15(-./0,123/45)(*)-15(/.)(*)-15(/2+,.)(*)-15(346/.7823/4)(*)-15(9,4,.82,:)(*)-15(;<)(*)-15(&+,)(*)-15(!==>52.823/4)(*)-15(?,4,.82/.)(*)-20(.,98.:349)(*)-20(2+,)(*)-20(=3@,=3+//:)(*)-20(/6)(*)-20(A8.3/>5)(*)-20(34A,527,42)(*)-20(/>)[(*)-15(*)-15(&+,)(*)-15(-./0,123/45)(*)-15(/.)(*)-15(/2+,.)(*)-15(346/.7823/4)(*)-15(9,4,.82,:)(*)-15(;<)(*)-15(&+,)(*)-15(!==>52.823/4)(*)-15(?,4,.82/.)(*)-20(.,98.:349)(*)-20(2+,)(*)-20(=3@,=3+//:)(*)-20(/6)(*)-20(A8.3/>5)(*)-20(34A,527,42)(*)-20(/>)(21/7,5)(*)-20(8.,)(*)-20(+<-/2+,2318=)(*)-20(34*)] TJ The problem is that there appears to be an open bracket [ in the middle of this line. If you search for -20(/>)[(*)-15 the problem is that open bracket. This makes the parser think it's reading an array inside the array. The ending ] then closes the inner array, and the whole thing blows up. At first blush, this looks like it's just a bad PDF. But the trick is that Acrobat parses and renders this thing just fine. So my question is: Is it possible that the above is actually valid per the PDF spec, and we are just missing something with the tokeniser or parser? It wouldn't seem like it would valid. But if that were the case, you'd really think that Acrobat wouldn't be able to parse it, either. Are we missing something in our parser, or is Acrobat doing some sort of intense logic to reconstruct the Tj operation if the array doesn't terminate properly? I've done some thinking on this, and I see no reasonable strategy for determining where in the content stream to insert an artificial ] -- View this message in context: http://itext-general.2136553.n4.nabble.com/Content-stream-question-tp3946312p3946312.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
