Tilman Hausherr commented on PDFBOX-4137:

I did another review while 100% awake... some thoughts:
 - decode: new name, e.g. decodeMeta
 - decodeoptions is missing in the new patch, but it is in the old one
 - subsample rename to "subsampling" to align with java
 - honored means all? Or just subsampled? => better name

Then I was wondering if something could go wrong if a stream is Flate encoded 
(which would do nothing if meta-only is on) and then JPX encoded, as JPXFilter 
is the only one that changes the parameters object. The JPXFilter would get a 
raw stream instead of an inflated one.

Try displaying page 3 of 067445.pdf. (I have not yet tested your patch, I have 
only read it)

If my understanding of your patch is correct, then it would mean we could 
remove the "meta only" part of your patch and just skip all decoding, depending 
if the last filter is JPX. This would mean less code changes in the filters.

I tried this simple solution (replaces an existing constructor) and it worked 
on my test set:
    public PDImageXObject(PDStream stream, PDResources resources) throws 
        super(stream, COSName.IMAGE);
        this.resources = resources;
        List<COSName> filters = stream.getFilters();
        if (filters != null && filters.size() > 0 && 
            try (COSInputStream is = stream.createInputStream())
                DecodeResult decodeResult = is.getDecodeResult();
                this.colorSpace = decodeResult.getJPXColorSpace();
This skips the decoding unless the last filter is JPX.

Sadly I can't see much speed increase although there should be. Maybe this is 
because there were many optimizations since PDFBOX-3340. The file mentioned 
there renders in 25 milliseconds. Older versions of PDFBox need more than 100.

> Allow subsampled/downscaled rendering of images, and rendering subimages 
> -------------------------------------------------------------------------
>                 Key: PDFBOX-4137
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4137
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Rendering
>    Affects Versions: 2.0.8
>            Reporter: Itai Shaked
>            Priority: Minor
>         Attachments: 0001-Image-render-subsample.patch, 067445.pdf, 
> image_rendering_subsampling_hack.patch
> Suggested/contributed change to allow subsampling of images and rendering 
> sub-regions of images.  
> The need arises from having very large images which are highly compressed 
> (usually JPEG or JBIG2). The current implementation decodes the entire image 
> into memory at full resolution, even if rendering is done at a much lower 
> resolution. 
> Since the change required augmenting the way Filters work (to allow 
> partial/subsampled decoding), it also includes a partial fix for PDFBOX-3340. 
> This change introduces "DecodeOptions" which are currently only applicable 
> for images. They include requesting only metadata (for PDImageXObject's 
> repair method), subsampling and sub-region (similar to 
> javax.imagio.ImageReadParam). 
> Since not all filters can or do honor (use) the options, the DecodeOptions 
> class contains a flag. Filters that honor the options (subsample / decode 
> only requested region) set it to true. If the flag is false, the subsampling 
> or cropping should be done after decoding, to ensure consistency. 
> PageDrawer was modified so it uses subsampling based on the ratio of the 
> desired output to the original image. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to