Re: Allowing subsampled/downscaled rendering of images, or rendering of subregions of images

Tilman Hausherr Thu, 01 Mar 2018 09:11:46 -0800

Thanks, sounds interesting. There's definitively a need for that. Justcreate an issue in JIRA with your text and your patch.

https://issues.apache.org/jira/browse/PDFBOX


About your patch:

Please remove any changes that are just reformatting. That makes morework for us, because it shows more changes than there really are. I tryto understand everything, not just test that it works. Example:


-                int r = clamp( (1.164f * (Y-16)) + (1.596f * (Cr - 128)) );

- int g = clamp( (1.164f * (Y-16)) + (-0.392f * (Cb-128))+ (-0.813f * (Cr-128)));

-                int b = clamp( (1.164f * (Y-16)) + (2.017f * (Cb-128)));
+                int r = clamp((1.164f * (Y - 16)) + (1.596f * (Cr - 128)));

+ int g = clamp((1.164f * (Y - 16)) + (-0.392f * (Cb -128)) + (-0.813f * (Cr -

+                        128)));
+                int b = clamp((1.164f * (Y - 16)) + (2.017f * (Cb - 128)));

Be aware that if your patch changes the public API, then it won't beused in the 2.0 branch. (Your patch should still be against the trunk).

Also make sure that your changes in SampledImageReader don't make the"normal" path (i.e. reading the entire stream and converting it to animage) slower. The current code is the result of several optimizations.

Public API (e.g. DecodeOptions) should have some javadoc. I have no ideawhat "honored" does.

The decode with METADATA_ONLY - does it mean nothing is decoded if thereis a scratch file???


Tilman


Am 01.03.2018 um 12:54 schrieb Itai:

Hello,
Following a question asked on pdfbox-users [1] , I set about trying toallow rendering images at lower resolutions, and additionallyrendering only parts of images. The need arises from having verylarge images, usually JPEG or JBIG2, which are tens of megabytes insize when compressed, but may take up 8 or even more gigabytes whenrendered as a BufferedImage at full resolution.I have come up with a solution that seems to work (passes all of thebuilt-in PDFBox tests, and a few manual ones I tried), but since itincludes some deep changes in the logic I understand if it won't findits way into PDFBox.
While working on it, I also came across PDFBOX-3340 [2], and since myhack relies on making changes to the way filters work, it includes a(partial) fix for that bug too.
Finally, since I'm not well versed in git/github, I'm not sure of thebest way to share my work. I attach here a unified diff, but let meknow if there is another preferred method (pull request? clone therepository?)
Following is an explanation/description of my changes, for thoseinterested. I would love to hear any feedback, especially for thingswhich may increase the likelihood of such a feature being included infuture versions of PDFBox.
Thanks,
Itai.

--
As stated, the issue pertains mainly to very large images (lots ofpixels) which are highly compressed. Since DCTFilter, JBIG2Filter etc.render the entire image, I had to augment the way Filter works, toallow it to accept options.This is where the class DecodeOptions comes in. It has sub-region andsubsampling options (mirroring those of ImageReadParam), as well as a"metadata only" param. When decoding, you may pass DecodeOptions, suchthat image-related filters can downscale or only render a part of theimage.The "metadata-only" option is used for the `repair` method ofPDImageXObject, as it only really needs the DecodeResult - whereapplicable and possible, a filter encountering this option will notdecode the stream, only set the DecodeResult parameters (this is notalways possible, e.g. for JPXFilter, which must decode the image toget the parameters).
The DecodeOptions also has an "honored" flag, which the filter sets totrue if it honored the options - this is needed because when decodingan image stored in a Flate or LZW stream, the filter doesn't know theimage format (or does it? I couldn't find a simple way of telling), soit can't make sense of subsampling or partial render options.SampledImageReader checks this flag, and if it is not set to true itdoes the subsampling by itself.
This allows the addition of a method in PDImage
BufferedImage getImage(Rectangle region, int subsample) throwsIOException;
The result of which is not cached, as it is not "canonical".
When drawing an image, PDPageDrawer calculates a subsampling factorbased on the desired size:
    int subsample = (int)Math.floor(pdImage.getWidth()/at.getScaleX());
    if (subsample<1) subsample = 1;
    if (subsample>8) subsample = 8;
    drawBufferedImage(pdImage.getImage(null, subsample), at);
Such that if e.g. the pixel should be drawn at 0.5 times itspixel-size, it will be subsampled at 2-pixel intervals.
SampledImageReader issues the corresponding DecodeOptions toPDImage#createInputStream when rendering, and if the "honored" flag isnot set, it does its own sub-sampling and partial rendering.
I realize most/all of those optimizations won't work for raw, Flate orLZW encoded images, but presumably those won't be too large in thefirst place. Also, this has little to no benefit for PDInlineImage,but as it already holds all of its raw data I assume littleoptimization is possible.
In general, this hack allowed me to speed-up rendering of some filesby significant margins (20%-80%, depending on size and desired DPI),and significantly lower the memory footprint if only a lower-resrender is required, or rendering of small regions of the image.
--
[1]:https://lists.apache.org/thread.html/6b396e3d8bfc4ed44bcadf37881035d7447fb711253ef962f187455c@%3Cusers.pdfbox.apache.org%3E
[2]: https://issues.apache.org/jira/browse/PDFBOX-3340


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Allowing subsampled/downscaled rendering of images, or rendering of subregions of images

Reply via email to