[ 
https://issues.apache.org/jira/browse/PDFBOX-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226084#comment-13226084
 ] 

Dave Smith commented on PDFBOX-1232:
------------------------------------

Sent in private .
                
> FlateDecoder in stream mode
> ---------------------------
>
>                 Key: PDFBOX-1232
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1232
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Dave Smith
>
> The zlib (the unlying spec for Flate compression) does not require an 
> Z_STREAM_END to terminate the compression. The Java InflateInputStream is 
> really assuming that you are reading a zip or gzip file which will always 
> have a Z_STREAM_END (Z_STREAM_END is a constant in the zlib library which 
> Java calls natively) . So the following chunk decodes fine using  the jcraft 
> zlib decoder, but fails using the InflateInputStream.
> 3 0 obj
> <<
> /Type /XObject
> /Subtype /Form
> /FormType 1
> /Resources << /Font 4 0 R
> /ProcSet [/PDF /ImageC /Text]>>
> /BBox [0 0 595 842]
> /Matrix [1 0 0 1 0 0]
> /Filter /FlateDecode
> /Length 5 >>
> stream
> H<89>^C^@
> endstream
> endobj
> The blob is 72, -119, 3, 0, 13 decimal. It decodes to an empty string.
> The fix is to use Inflater and check to see if it has consumed all of the 
> input buffer and make sure it has nothing to write into the output buffer.
> protected ByteArrayOutputStream decompress(InputStream in)
>       throws IOException, DataFormatException
>   {
>       ByteArrayOutputStream out = new ByteArrayOutputStream();
>       byte buf[] = new byte[1000];
>       Inflater inflater = new Inflater();
>       int read = in.read(buf);
>       if(read == 0)
>       {
>               return out;
>       }
>       inflater.setInput(buf,0,read);
>       byte res[] = new byte[1000];
>       while(true)
>       {
>               int resRead = inflater.inflate(res);
>               if(resRead !=0)
>               {
>                       out.write(res,0,resRead);
>                       continue;
>               }
>               if(inflater.finished() || inflater.needsDictionary() ||  
> in.available()==0)
>               {
>                       out.close();
>                       return out;
>               }
>              read = in.read(buf);
>              inflater.setInput(buf,0,read);
>     
>       }
>   }
> We then need to change FlateFilter.decode(InputStream compressedData, 
> OutputStream result,
> COSDictionary options, int filterIndex )
> to look like ...
>  if (compressedData.available() > 0)
>           {
>               try
>               {
>                       baos =  decompress(compressedData);
>               }
> if (predictor==-1 || predictor == 1 )
>               {
>                  result.write(baos.toByteArray());
>               }
> else
> {
>  use the bytearrayoutput stream as before ...
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to