Petr Slaby created PDFBOX-3340:
----------------------------------
Summary: Image decoded twice without a real need
Key: PDFBOX-3340
URL: https://issues.apache.org/jira/browse/PDFBOX-3340
Project: PDFBox
Issue Type: Bug
Reporter: Petr Slaby
Priority: Minor
Take the pdf from PDFBOX-1708, put a breakpoint into the class CCITTFaxFilter,
method decode() and run PDFToImage. You will see the debugger stop twice, even
if the pdf contains a single image.
The second call is arrives when the image is rendered to G2D, this is OK. But
for the first time, the image is decompressed in the constructor of
PDImageXObject - line 147
{noformat}
this(stream, resources, stream.createInputStream());
{noformat}
just to allow the filter (CCITTFaxFilter in this case) to provide additional
dictionary parameters in case something is missing in the input (COLORSPACE
would be set to DeviceGray if missing here).
I think this is a complete waste. The filter should be able to fix the
dictionary without having to decode the image. As far as I can tell, this could
be done by implementing a repair method on COSStream and on implementations of
Filter.
Also, I do not see that the stream created in the above mentioned constructor
of PDImageXObject would ever be closed. This seems to be a more general issue.
I have put a counter into COSInputStream.create(), there where it creates new
RandomAccessInputStream(buffer). With the testfile from PDFBOX-1708, I end up
with 3 unclosed streams when the program finishes. I am not sure whether this
is important, but I guess the unclosed streams are uselessly occupying space in
the scratch file.
Sorry if this is just lack of understanding of the code from my side, but I
could not resist to report what I see.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]