[ 
https://issues.apache.org/jira/browse/PDFBOX-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359065#comment-17359065
 ] 

Andreas Lehmkühler commented on PDFBOX-5207:
--------------------------------------------

AFAIK nested arrays are not allowed as operand for a TJ operator. The pdf in 
question has at least one array which is malformed (nested array, unbalanced 
number of square braces). Before PDFBOX-5190 those arrays were skipped and now 
the parser reads as much as possible. That nested arrays lead to an IOException 
in 
{{org.apache.pdfbox.contentstream.PDFStreamEngine.showTextStrings(COSArray)}}. 
I'm thinking about skipping such nested arrays and continue with the remaining 
part. In the current case the rendering is improved!!

BTW: we should think about a refactoring of 
{{org.apache.pdfbox.pdfparser.PDFStreamParser}}. It uses COS-objects when 
parsing a content stream. Although such content is very similar to COS-objects, 
they aren't. This should simplify the parsing and should reduce the resources 
to be used. But that is another story ...

> Page not rendered / extracted, Unknown type in array for TJ operation
> ---------------------------------------------------------------------
>
>                 Key: PDFBOX-5207
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5207
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.23
>            Reporter: Tilman Hausherr
>            Priority: Major
>              Labels: regression
>         Attachments: ContentStream.txt, evince-395-0.zip-0.pdf
>
>
> Worked in 2.0.23, no longer now. The weird thing is that the content stream 
> (attached) is the same. It contains a "[" in an array at offset 4211.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to