[ 
https://issues.apache.org/jira/browse/PDFBOX-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294979#comment-17294979
 ] 

Gábor Stefanik commented on PDFBOX-4831:
----------------------------------------

This is still an issue on 2.0.22.

 

The core issue is this code in PageDrawer.drawImage():
{code:java}
        if (!pdImage.getInterpolate())
        {
            // if the image is scaled down, we use smooth interpolation, eg 
PDFBOX-2364
            // only when scaled up do we use nearest neighbour, eg PDFBOX-2302 
/ mori-cvpr01.pdf
            // PDFBOX-4930: we use the sizes of the ARGB image. These can be 
different
            // than the original sizes of the base image, when the mask is 
bigger.
            boolean isScaledUp = pdImage.getImage().getWidth() < 
Math.round(at.getScaleX()) ||
                                 pdImage.getImage().getHeight() < 
Math.round(at.getScaleY());

            if (isScaledUp)
            {
                graphics.setRenderingHint(RenderingHints.KEY_INTERPOLATION,
                        RenderingHints.VALUE_INTERPOLATION_NEAREST_NEIGHBOR);
            }
        }
{code}
Semantically, an image is "scaled up" if the canvas area it's being drawn on is 
bigger (higher resolution) than the image itself. However, this compares the 
image's resolution not to the size of the actual canvas we're drawing onto 
(which would be given by a combination of the "at" affine transform and the 
transform inside this.graphics), but the PDF's canonical 72DPI canvas. In the 
context of renderImageWithDPI, rather than checking if the image, at its 
present scale, has a higher DPI than the value passed to renderImageWithDPI, 
we're checking if it has a DPI higher than 72. So any time we're actually 
trying to render this image in >72DPI, this logic may report that an image is 
scaled down when in reality it's being scaled up.

Additionally, in this scenario, checking for "not scaled down" would make more 
sense than strictly "scaled up", i.e. we shouldn't force interpolation if the 
image isn't being scaled at all. Care also must be taken to disregard any 
negligible "downscaling" resulting from floating point rounding errors, e.g. 
trying to render a 500px wide image onto an apparently 499.998472px canvas area 
shouldn't be considered downscaling, and later, the image should simply be 
drawn 1:1.

Because of this, we end up drawing the image with bicubic interpolation, 
overriding the lack of an interpolate flag on the image.

The "slow path" subsequently happens inside graphics.drawImage() 
(sun.java2d.SunGraphics2d.drawImage()). As we enter drawBufferedImage, 
imageTransform already suffers from visible rounding errors due to the use of 
single-precision floats in the code leading up to here (AffineTransform is 
double-precision internally, and SunGraphics2d.drawImage() assumes that any 
AffineTransform was computed from double-precision inputs, and then uses limits 
appropriate for double precision to detect "almost-integer" values with a 
rounding error). drawImage() then hands off to 
sun.java2d.pipe.DrawImage.transformImage(), where the "checkfinalxform" path is 
taken, and imageTransform is concatenated with the graphics canvas's previously 
set transform, which itself has rounding errors. The errors in the 2 transforms 
are thus compounded. (In my case, trying to effectively draw an image 1:1, I 
end up with AffineTransform[[0.999999982362152, 0.0, 0.0], [-0.0, 
0.999999915403138, 2.5431314134E-4]], which is almost AffineTransform[[1.0, 
0.0, 0.0], [0.0, 1.0, 0.0]], but off by just enough to cause problems.)

In the inner transformImage() call, the "coords" array is assembled, and then 
it's transformed using the badly rounded "almost-identity" AffineTransform 
above. Prior to this transformation, coords[0] == coords[4] and coords[3] == 
coords[5], so optimizations inside tryCopyOrScale() would be taken (this would 
be the "fast path"), avoiding the need to call renderImageXform(), which 
actually performs the interpolated scaling. After coords is transformed, it's 
still "close enough" that we enter tryCopyOrScale(), but then inside, the error 
on the width coordinate exceeds MAX_TX_ERROR (0.0001), so the renderImageCopy() 
path for 1:1 output isn't taken. Instead, we drop into renderImageScale(), 
which would still be an optimization, and yield a pure black-and-white image, 
though somewhat slower than renderImageCopy(). However, renderImageScale() only 
applies if the scaling mode is nearest neighbor, and because of the earlier 
wrong determination that the image is being "scaled down", the scaling mode is 
instead bicubic, forcing DrawImage to bail out of the optimized path, and fall 
back to the slowest path of renderImageXform() with bicubic interpolation. As a 
result, we not only waste resources performing a needless interpolation, but 
also introduce blurryness to the image, and in the case of pure black-and-white 
images, introduce unwanted grayscale pixels into the output.

So, in summary, two things go wrong to produce this outcome:
 * First, isScaledUp is determined wrongly (comparison to wrong canvas, and 
wrong handling of equality case), causing the "no interpolation" setting on the 
image to be ignored in an effort to avoid PDFBOX-2364.
 * Then, due to careless use of single-precision math, we accumulate enough 
error in the affine transform matrices to make the Java graphics stack think 
it's seeing a request to rescale the image, rather than just rounding errors.

I would suggest the following fixes:
 * Introduce a new version of PDFRenderer.renderImage() that takes a double for 
the "scale" parameter. Call this from renderImageWithDPI(), rather than the 
single precision float version.
 * Take the graphics-side affine transform into account when checking if an 
image is being scaled up or down.
 * Treat an image that's not being scaled at all (1:1 scale) as if it were 
upscaled, and allow nearest-neighbor rendering. It may also make sense to add 
some tolerance for rounding errors here, e.g. allow nearest-neighbor if the 
canvas width/height, rounded up to the nearest integer, is no less than the 
corresponding image dimension (which is guaranteed to be an integer).
 * Consider storing COSFloat values internally as double precision.

> Rounding errors when rendering non-interleaved binary CCITT image at 1:1 
> scale cause gray pixels in output
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4831
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4831
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.19
>            Reporter: Gábor Stefanik
>            Priority: Major
>         Attachments: 13._Korona_szallo_vegzes_13.09.26.eredeti.pdf
>
>
> I have a 300dpi scanned PDF file with a single CCITT-encoded black-and-white 
> image in each page, spanning the whole page. The images all have a resolution 
> of 2480x3504.
>  
> When I try to render a page from this PDF into a PNG at 300DPI, the resulting 
> PNG has some pixels with colors #010101 and #fefefe. The PNG has the same 
> 2480x3504 dimensions as the embedded CCITT images, but stepping through the 
> PDFBox code reveals it's trying to downscale the image by a tiny fraction of 
> a pixel (e.g. to 2479.999964573x3503.9999537378) using bicubic interpolation, 
> introducing these "near-black" and "near-white" pixels due to rounding 
> errors. Additionally, the actual image conversion code goes to a slow path 
> intended for "proper" interpolated scaling, rather than hitting the fast path 
> for copying 1:1-scale images.
>  
> For now, we worked around this by treating images containing only #000000, 
> #010101, #fefefe and #ffffff as binary, but the performance hit from the slow 
> path is still there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to