[ https://issues.apache.org/jira/browse/PDFBOX-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226084#comment-13226084 ]
Dave Smith commented on PDFBOX-1232: ------------------------------------ Sent in private . > FlateDecoder in stream mode > --------------------------- > > Key: PDFBOX-1232 > URL: https://issues.apache.org/jira/browse/PDFBOX-1232 > Project: PDFBox > Issue Type: Bug > Reporter: Dave Smith > > The zlib (the unlying spec for Flate compression) does not require an > Z_STREAM_END to terminate the compression. The Java InflateInputStream is > really assuming that you are reading a zip or gzip file which will always > have a Z_STREAM_END (Z_STREAM_END is a constant in the zlib library which > Java calls natively) . So the following chunk decodes fine using the jcraft > zlib decoder, but fails using the InflateInputStream. > 3 0 obj > << > /Type /XObject > /Subtype /Form > /FormType 1 > /Resources << /Font 4 0 R > /ProcSet [/PDF /ImageC /Text]>> > /BBox [0 0 595 842] > /Matrix [1 0 0 1 0 0] > /Filter /FlateDecode > /Length 5 >> > stream > H<89>^C^@ > endstream > endobj > The blob is 72, -119, 3, 0, 13 decimal. It decodes to an empty string. > The fix is to use Inflater and check to see if it has consumed all of the > input buffer and make sure it has nothing to write into the output buffer. > protected ByteArrayOutputStream decompress(InputStream in) > throws IOException, DataFormatException > { > ByteArrayOutputStream out = new ByteArrayOutputStream(); > byte buf[] = new byte[1000]; > Inflater inflater = new Inflater(); > int read = in.read(buf); > if(read == 0) > { > return out; > } > inflater.setInput(buf,0,read); > byte res[] = new byte[1000]; > while(true) > { > int resRead = inflater.inflate(res); > if(resRead !=0) > { > out.write(res,0,resRead); > continue; > } > if(inflater.finished() || inflater.needsDictionary() || > in.available()==0) > { > out.close(); > return out; > } > read = in.read(buf); > inflater.setInput(buf,0,read); > > } > } > We then need to change FlateFilter.decode(InputStream compressedData, > OutputStream result, > COSDictionary options, int filterIndex ) > to look like ... > if (compressedData.available() > 0) > { > try > { > baos = decompress(compressedData); > } > if (predictor==-1 || predictor == 1 ) > { > result.write(baos.toByteArray()); > } > else > { > use the bytearrayoutput stream as before ... > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira