[
https://issues.apache.org/jira/browse/PDFBOX-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002812#comment-13002812
]
David Newton commented on PDFBOX-958:
-------------------------------------
I've been looking at this myself - it's not a problem across all PDFs, and the
given example is actually the only PDF that I've been able to get it to happen
in. I don't see anything remarkable about the data from the image given in the
PDFDebugger tool:
I2:Stream (XObject:Image)
BitsPerComponent: 8
ColorSpace: DeviceRGB
DecodeParms: Dictionary
Colors: 3
Columns: 500
Predictor: 15
Filter: FlateDecode
Height: 667
Length: 820037
Subtype: Image
Type: XObject
Width: 500
If this hasn't been looked at yet, can anyone provide an idea of what might be
different about this document that causes its images to be converted wrongly?
I've found that the image is already mangled just after calling the
getRGBImage() method of the PDJpeg object, so it isn't happening when drawing
the image to the page.
> convertToImage mangles images which were in the PDF
> ---------------------------------------------------
>
> Key: PDFBOX-958
> URL: https://issues.apache.org/jira/browse/PDFBOX-958
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.2.1, 1.4.0
> Environment: RHEL5 and WinXP, java version "1.6.0_23"
> Reporter: Eric Schwarzenbach
> Priority: Critical
> Attachments: Image of Page 13.jpeg, Image of Page 13.png, Wrycan®
> Lorem Ipsum Test.pdf
>
>
> Of the PDFs we've tried running through PDFBox and generating page images, a
> number of them (coming from disparate sources and method of creation) seem to
> produce images where an image that was embedded in the page of the PDF shows
> somewhat mangled. It seems to be divided by horizontal stripes, where some
> stripes look normal, others seem to have some kind of "smearing" effect going
> on. See attached images and original PDF (image is of page 13).
> I marked this as critical as we are trying to use PDFBox in a project where
> page images are crucial, and inability to produce reasonable looking page
> images is pretty much a deal breaker.
> The code we use to extract the images looks more or less like the following:
> BufferedImage image =
> page.convertToImage();
>
> SmartDeferredFileOutputStream outStream
> = new SmartDeferredFileOutputStream();
> String[] writerFormatNames =
> ImageIO.getWriterFormatNames();
> ImageIO.write(image, "jpeg", outStream);
> outStream.close()
> We've also tried specifying "png". In both "jpg" and "png" cases we get an
> image file that is indeed the correct format, and both images look exactly
> the same.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira