[
https://issues.apache.org/jira/browse/PDFBOX-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294979#comment-17294979
]
Gábor Stefanik commented on PDFBOX-4831:
----------------------------------------
This is still an issue on 2.0.22.
The core issue is this code in PageDrawer.drawImage():
{code:java}
if (!pdImage.getInterpolate())
{
// if the image is scaled down, we use smooth interpolation, eg
PDFBOX-2364
// only when scaled up do we use nearest neighbour, eg PDFBOX-2302
/ mori-cvpr01.pdf
// PDFBOX-4930: we use the sizes of the ARGB image. These can be
different
// than the original sizes of the base image, when the mask is
bigger.
boolean isScaledUp = pdImage.getImage().getWidth() <
Math.round(at.getScaleX()) ||
pdImage.getImage().getHeight() <
Math.round(at.getScaleY());
if (isScaledUp)
{
graphics.setRenderingHint(RenderingHints.KEY_INTERPOLATION,
RenderingHints.VALUE_INTERPOLATION_NEAREST_NEIGHBOR);
}
}
{code}
Semantically, an image is "scaled up" if the canvas area it's being drawn on is
bigger (higher resolution) than the image itself. However, this compares the
image's resolution not to the size of the actual canvas we're drawing onto
(which would be given by a combination of the "at" affine transform and the
transform inside this.graphics), but the PDF's canonical 72DPI canvas. In the
context of renderImageWithDPI, rather than checking if the image, at its
present scale, has a higher DPI than the value passed to renderImageWithDPI,
we're checking if it has a DPI higher than 72. So any time we're actually
trying to render this image in >72DPI, this logic may report that an image is
scaled down when in reality it's being scaled up.
Additionally, in this scenario, checking for "not scaled down" would make more
sense than strictly "scaled up", i.e. we shouldn't force interpolation if the
image isn't being scaled at all. Care also must be taken to disregard any
negligible "downscaling" resulting from floating point rounding errors, e.g.
trying to render a 500px wide image onto an apparently 499.998472px canvas area
shouldn't be considered downscaling, and later, the image should simply be
drawn 1:1.
Because of this, we end up drawing the image with bicubic interpolation,
overriding the lack of an interpolate flag on the image.
The "slow path" subsequently happens inside graphics.drawImage()
(sun.java2d.SunGraphics2d.drawImage()). As we enter drawBufferedImage,
imageTransform already suffers from visible rounding errors due to the use of
single-precision floats in the code leading up to here (AffineTransform is
double-precision internally, and SunGraphics2d.drawImage() assumes that any
AffineTransform was computed from double-precision inputs, and then uses limits
appropriate for double precision to detect "almost-integer" values with a
rounding error). drawImage() then hands off to
sun.java2d.pipe.DrawImage.transformImage(), where the "checkfinalxform" path is
taken, and imageTransform is concatenated with the graphics canvas's previously
set transform, which itself has rounding errors. The errors in the 2 transforms
are thus compounded. (In my case, trying to effectively draw an image 1:1, I
end up with AffineTransform[[0.999999982362152, 0.0, 0.0], [-0.0,
0.999999915403138, 2.5431314134E-4]], which is almost AffineTransform[[1.0,
0.0, 0.0], [0.0, 1.0, 0.0]], but off by just enough to cause problems.)
In the inner transformImage() call, the "coords" array is assembled, and then
it's transformed using the badly rounded "almost-identity" AffineTransform
above. Prior to this transformation, coords[0] == coords[4] and coords[3] ==
coords[5], so optimizations inside tryCopyOrScale() would be taken (this would
be the "fast path"), avoiding the need to call renderImageXform(), which
actually performs the interpolated scaling. After coords is transformed, it's
still "close enough" that we enter tryCopyOrScale(), but then inside, the error
on the width coordinate exceeds MAX_TX_ERROR (0.0001), so the renderImageCopy()
path for 1:1 output isn't taken. Instead, we drop into renderImageScale(),
which would still be an optimization, and yield a pure black-and-white image,
though somewhat slower than renderImageCopy(). However, renderImageScale() only
applies if the scaling mode is nearest neighbor, and because of the earlier
wrong determination that the image is being "scaled down", the scaling mode is
instead bicubic, forcing DrawImage to bail out of the optimized path, and fall
back to the slowest path of renderImageXform() with bicubic interpolation. As a
result, we not only waste resources performing a needless interpolation, but
also introduce blurryness to the image, and in the case of pure black-and-white
images, introduce unwanted grayscale pixels into the output.
So, in summary, two things go wrong to produce this outcome:
* First, isScaledUp is determined wrongly (comparison to wrong canvas, and
wrong handling of equality case), causing the "no interpolation" setting on the
image to be ignored in an effort to avoid PDFBOX-2364.
* Then, due to careless use of single-precision math, we accumulate enough
error in the affine transform matrices to make the Java graphics stack think
it's seeing a request to rescale the image, rather than just rounding errors.
I would suggest the following fixes:
* Introduce a new version of PDFRenderer.renderImage() that takes a double for
the "scale" parameter. Call this from renderImageWithDPI(), rather than the
single precision float version.
* Take the graphics-side affine transform into account when checking if an
image is being scaled up or down.
* Treat an image that's not being scaled at all (1:1 scale) as if it were
upscaled, and allow nearest-neighbor rendering. It may also make sense to add
some tolerance for rounding errors here, e.g. allow nearest-neighbor if the
canvas width/height, rounded up to the nearest integer, is no less than the
corresponding image dimension (which is guaranteed to be an integer).
* Consider storing COSFloat values internally as double precision.
> Rounding errors when rendering non-interleaved binary CCITT image at 1:1
> scale cause gray pixels in output
> ----------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-4831
> URL: https://issues.apache.org/jira/browse/PDFBOX-4831
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 2.0.19
> Reporter: Gábor Stefanik
> Priority: Major
> Attachments: 13._Korona_szallo_vegzes_13.09.26.eredeti.pdf
>
>
> I have a 300dpi scanned PDF file with a single CCITT-encoded black-and-white
> image in each page, spanning the whole page. The images all have a resolution
> of 2480x3504.
>
> When I try to render a page from this PDF into a PNG at 300DPI, the resulting
> PNG has some pixels with colors #010101 and #fefefe. The PNG has the same
> 2480x3504 dimensions as the embedded CCITT images, but stepping through the
> PDFBox code reveals it's trying to downscale the image by a tiny fraction of
> a pixel (e.g. to 2479.999964573x3503.9999537378) using bicubic interpolation,
> introducing these "near-black" and "near-white" pixels due to rounding
> errors. Additionally, the actual image conversion code goes to a slow path
> intended for "proper" interpolated scaling, rather than hitting the fast path
> for copying 1:1-scale images.
>
> For now, we worked around this by treating images containing only #000000,
> #010101, #fefefe and #ffffff as binary, but the performance hit from the slow
> path is still there.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]