Kevin, Kevin Day wrote: > I have an existing PDF that I'm trying to parse text out of, and am > winding up with a null pointer exception when reading an array in the > content stream. > > I have narrowed the problem down to a particular line in the content > stream (if I run this one line through PdfContentParser.parse() it fails): > [...] > > The problem is that there appears to be an open bracket [ in the middle of > this line. If you search for -20(/>)[(*)-15 the problem is that open > bracket. This makes the parser think it's reading an array inside the > array. The ending ] then closes the inner array, and the whole thing > blows up. > > At first blush, this looks like it's just a bad PDF. But the trick is > that Acrobat parses and renders this thing just fine.
Acrobat is quite lax about errors; thus, it need not be your aim to emulate it. To me that line looks like the beginning of the line --- "[(*)-15(*)-15(&+,)(*)-15(-./0,123/45)(*)-15(/.)(*)-15(/2+,.)(*)-15(346/.7823/4)(*)-15(9,4,.82,:)(*)-15(;<)(*)-15(&+,)(*)-15(!==>52.823/4)(*)-15(?,4,.82/.)(*)-20(.,98.:349)(*)-20(2+,)(*)-20(=3@,=3+//:)(*)-20(/6)(*)-20(A8.3/>5)(*)-20(34A,527,42)(*)-20(/>)" --- is doubled, and most likely unintentionally so. Does Acrobat actually display these contents twice? If it doesn't, it maybe just ignores the first occurance... Or if it does, maybe (not expecting inner arrays) it ignores the extra '['... Kevin Day wrote: > Are we missing something in our parser, or is Acrobat doing some sort of > intense logic to reconstruct the Tj operation if the array doesn't > terminate properly? I've done some thinking on this, and I see no > reasonable strategy for determining where in the content stream to insert > an artificial ] As mentioned above I think it more likely Acrobat simply ignores something, either the doubled beginning of the line or the extra '['. And I doubt that behaviour is required by the spec, most likely Acrobat is simply lax on its input. And I doubt you want to be as lax in automated processes... Regards, Michael -- View this message in context: http://itext-general.2136553.n4.nabble.com/Content-stream-question-tp3946312p3947169.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
