[patch] Please don't print logging statements to System.err
-----------------------------------------------------------
Key: PDFBOX-742
URL: https://issues.apache.org/jira/browse/PDFBOX-742
Project: PDFBox
Issue Type: Improvement
Components: PDModel
Affects Versions: 1.1.0
Reporter: Antoni Mylka
There are three org.apache.pdfbox.filter.Filter implementations which are
unimplemented. These are:
CCITTFaxDecodeFilter
DCTFilter
RunLengthDecodeFilter
They all contain calls to System.err with messages like
Warning: DCTFilter.decode is not implemented yet, skipping this stream.
In my code I iterate over all images in a PDF and try to obtain their raw,
undecoded content. I use code like this:
private byte [] getUnDecodedImageBytes(COSStream st) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
IOUtil.writeStream(st.getUnfilteredStream(), baos);
return baos.toByteArray();
}
The getUnfilteredStream() method, when called on JPG embedded images seems to
try to invoke the DCTFilter. If I have a large ebook file with lots of JPG
images - this yields LOTS of text to the Standard error output which can't be
suppressed.
PDFBox uses commons-logging all over the place. Why not push those warnings to
the log. They are non-critical. In my particular case when I use the above
method I get an empty array. If I do, I resort to another method:
private byte [] getDecodedImageBytes(COSStream st) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PDXObjectImage ximage = (PDXObjectImage)PDXObject.createXObject( st );
ximage.write2OutputStream(baos);
return baos.toByteArray();
}
This seems to work, even for those images where getUnfilteredStream returns an
empty stream.
I don't quite understand what's the difference, since I would expect a method
labelled 'getUnfilteredStream' to return the stream as-it-is in the PDF file,
without using any Filters. Moreover such a failure would imply that the library
simply cannot process JPG images in PDF files, which is not the case because
write2OutputStream works OK. So I don't know where the real problem lies. Maybe
someone with more PDFBox knowledge could take a look.
Still, my patch only moves those warnings to the log, where I can suppress
them. This is simple and fixes the immediate problem in my application.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.