[
https://issues.apache.org/jira/browse/PDFBOX-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959558#comment-16959558
]
Michael Klink edited comment on PDFBOX-4674 at 10/25/19 9:10 AM:
-----------------------------------------------------------------
When I try to open your PDF in Adobe Reader, it warns me that an error exists
on the page and that it may not be displayed correctly.
Errors on the page imply that the page may be displayed differently on
different viewers. PDFBox shows the scanned page in that shadowy way. And Adobe
Reader here does not display the scanned page at all.
Garbage in - garbage out.
You should consider using non-broken PDFs.
In more detail:
The scanned images on the pages of your PDF are invalid. Their dictionaries
have these values:
{noformat}
/Filter/DCTDecode
/BitsPerComponent 5
{noformat}
According to the PDF specification, though:
{panel:title=My title}
*BitsPerComponent* - integer - _(Required except for image masks and images
that use the *JPXDecode* filter)_ The number of bits used to represent each
colour component. Only a single value shall be specified; the number of bits
shall be the same for all colour components. The value shall be _1, 2, 4, 8,_
or (from PDF 1.5) _16_. If *ImageMask* is _true_, this entry is optional, but
if specified, its value shall be _1_.
If the image stream uses a filter, the value of *BitsPerComponent* shall be
consistent with the size of the data samples that the filter delivers. In
particular, a *CCITTFaxDecode* or *JBIG2Decode* filter shall always deliver
1-bit samples, a *RunLengthDecode* or *DCTDecode* filter shall always deliver
8-bit samples, and an *LZWDecode* or *FlateDecode* filter shall deliver samples
of a specified size if a predictor function is used.|
{panel}
Thus, *BitsPerComponent* must be one of _1, 2, 4, 8,_ and _16_ anyways, and in
case of a *Filter DCTDecode* it must be 8.
In your case it is _5_, i.e. invalid.
PDFBox apparently tries to render it nonetheless and the output is the garbage
you observed:
!es-page-image2455431271065294360.png!
was (Author: mkl):
When I try to open your PDF in Adobe Reader, it warns me that an error exists
on the page and that it may not be displayed correctly.
Errors on the page imply that the page may be displayed differently on
different viewers. You should consider using non-broken PDFs.
> PDF Page Render Background Image has Gray Smudges
> -------------------------------------------------
>
> Key: PDFBOX-4674
> URL: https://issues.apache.org/jira/browse/PDFBOX-4674
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 2.0.17
> Reporter: Joseph Jezerinac
> Priority: Major
> Attachments: bad_page_image.pdf, es-page-image2455431271065294360.png
>
>
> The following text produces a PNG that has gray smudges in it. I've attached
> the pdf and the PNG that is produced.
>
> {code:java}
> public class TestPdfPageImage {
> @Test
> public void testGetPageImage() throws IOException {
> try (PDDocument pdDocument =
> PDDocument.load(FileUtils.toFile(getClass().getResource("/bad_page_image.pdf"))))
> {
> final PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
> final BufferedImage bufferedImage = pdfRenderer.renderImage(0);
> final Path tempPath = Files.createTempFile("es-page-image",
> ".png");
> try {
> final File tempFile = tempPath.toFile();
> ImageIO.write(bufferedImage, "png", tempFile);
> Assert.assertTrue(Files.size(tempPath) > 0);
> } finally {
> Files.delete(tempPath);
> }
> }
> }
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]