tomas kochan created PDFBOX-4750:
------------------------------------

             Summary: java.io.IOException: Error:Unknown type in content 
stream:COSNull{}
                 Key: PDFBOX-4750
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4750
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 2.0.18, 2.0.8
            Reporter: tomas kochan


By removing some optional content for specific document, which is bordered with 
Operator BDC and EMC, we are facing an issue by writing the changed set of 
tokens into PDStream. 
 The code looks like:
 PDStream updatedStream = new PDStream(document);
 OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
 ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
 tokenWriter.writeTokens(result);
 out.flush();
 out.close();
 page.setContents(updatedStream);
  
 The following exception occurs at line 'tokenWriter.writeTokens(result);' :
 java.io.IOException: Error:Unknown type in content stream:COSNull{}
 at 
org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
 at 
org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
 at 
org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
 at 
org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
 at 
de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
 at 
de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)

 

After the analyze we figured out two issues:
 1. We assume, the Pdf Document it's self is corrupted, It contains on some 
place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
image object. This Operator is not followed by "ID" or "EI" operator. 
 Extract from list of Tokens:
 next PDFOperator\{Do}

next COSFloat\{0.016674607}

next COSInt\{0}
 next COSInt\{0}

next COSFloat\{0.061831153}

next COSFloat\{0.070509767}

next COSFloat\{-0.302021403}
 next PDFOperator\{cm}

next PDFOperator\{BI}

next PDFOperator\{Q}
 next PDFOperator\{Q}

next COSName\{OC}

next COSName\{eAkteOptionalContent7}

next PDFOperator\{BDC}

Moreover one "DP" Entry in the "BI" operator's COSDictionary contains COSArray 
with COSNull values. However the assumption is, that the COSNull values are not 
forbidden in the Pdf content. 
 
COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}

2. Despite wrong content in the pdf-document (described above) the PDF-Box api 
crashed by storing this operators into PDStream by his inability to recognize 
COSNull in the method 
org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)

 

The assumption on this place is, that the method "writeObject" forgot to cover 
COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is valid 
Object, which is broadly used by PDF-Api itself.

The Method org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  
PDF-Api 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
conditions, instead of it throws the new IOException( "Error:Unknown type in 
content stream:" + o ). 

 

Could you confirm, that the method writeObject contains bug and should be 
corrected to cover also COSNull Object? If so, in which version could we expect 
the fix?


Thank you

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to