Hello, Following a question asked on pdfbox-users [1] , I set about trying to allow rendering images at lower resolutions, and additionally rendering only parts of images. The need arises from having very large images, usually JPEG or JBIG2, which are tens of megabytes in size when compressed, but may take up 8 or even more gigabytes when rendered as a BufferedImage at full resolution. I have come up with a solution that seems to work (passes all of the built-in PDFBox tests, and a few manual ones I tried), but since it includes some deep changes in the logic I understand if it won't find its way into PDFBox.
While working on it, I also came across PDFBOX-3340 [2], and since my hack relies on making changes to the way filters work, it includes a (partial) fix for that bug too. Finally, since I'm not well versed in git/github, I'm not sure of the best way to share my work. I attach here a unified diff, but let me know if there is another preferred method (pull request? clone the repository?) Following is an explanation/description of my changes, for those interested. I would love to hear any feedback, especially for things which may increase the likelihood of such a feature being included in future versions of PDFBox. Thanks, Itai. -- As stated, the issue pertains mainly to very large images (lots of pixels) which are highly compressed. Since DCTFilter, JBIG2Filter etc. render the entire image, I had to augment the way Filter works, to allow it to accept options. This is where the class DecodeOptions comes in. It has sub-region and subsampling options (mirroring those of ImageReadParam), as well as a "metadata only" param. When decoding, you may pass DecodeOptions, such that image-related filters can downscale or only render a part of the image. The "metadata-only" option is used for the `repair` method of PDImageXObject, as it only really needs the DecodeResult - where applicable and possible, a filter encountering this option will not decode the stream, only set the DecodeResult parameters (this is not always possible, e.g. for JPXFilter, which must decode the image to get the parameters). The DecodeOptions also has an "honored" flag, which the filter sets to true if it honored the options - this is needed because when decoding an image stored in a Flate or LZW stream, the filter doesn't know the image format (or does it? I couldn't find a simple way of telling), so it can't make sense of subsampling or partial render options. SampledImageReader checks this flag, and if it is not set to true it does the subsampling by itself. This allows the addition of a method in PDImage BufferedImage getImage(Rectangle region, int subsample) throws IOException; The result of which is not cached, as it is not "canonical". When drawing an image, PDPageDrawer calculates a subsampling factor based on the desired size: int subsample = (int)Math.floor(pdImage.getWidth()/at.getScaleX()); if (subsample<1) subsample = 1; if (subsample>8) subsample = 8; drawBufferedImage(pdImage.getImage(null, subsample), at); Such that if e.g. the pixel should be drawn at 0.5 times its pixel-size, it will be subsampled at 2-pixel intervals. SampledImageReader issues the corresponding DecodeOptions to PDImage#createInputStream when rendering, and if the "honored" flag is not set, it does its own sub-sampling and partial rendering. I realize most/all of those optimizations won't work for raw, Flate or LZW encoded images, but presumably those won't be too large in the first place. Also, this has little to no benefit for PDInlineImage, but as it already holds all of its raw data I assume little optimization is possible. In general, this hack allowed me to speed-up rendering of some files by significant margins (20%-80%, depending on size and desired DPI), and significantly lower the memory footprint if only a lower-res render is required, or rendering of small regions of the image. -- [1]: https://lists.apache.org/thread.html/6b396e3d8bfc4ed44bcadf37881035d7447fb711253ef962f187455c@%3Cusers.pdfbox.apache.org%3E [2]: https://issues.apache.org/jira/browse/PDFBOX-3340
diff --git a/pdfbox/src/main/java/org/apache/pdfbox/cos/COSInputStream.java b/pdfbox/src/main/java/org/apache/pdfbox/cos/COSInputStream.java index a11445131..058ed5e81 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/cos/COSInputStream.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/cos/COSInputStream.java @@ -24,6 +24,8 @@ import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; import java.util.List; + +import org.apache.pdfbox.filter.DecodeOptions; import org.apache.pdfbox.filter.DecodeResult; import org.apache.pdfbox.filter.Filter; import org.apache.pdfbox.io.RandomAccess; @@ -50,6 +52,12 @@ public final class COSInputStream extends FilterInputStream */ static COSInputStream create(List<Filter> filters, COSDictionary parameters, InputStream in, ScratchFile scratchFile) throws IOException + { + return create(filters, parameters, in, scratchFile, DecodeOptions.DEFAULT); + } + + static COSInputStream create(List<Filter> filters, COSDictionary parameters, InputStream in, + ScratchFile scratchFile, DecodeOptions options) throws IOException { List<DecodeResult> results = new ArrayList<>(); InputStream input = in; @@ -66,7 +74,7 @@ public final class COSInputStream extends FilterInputStream { // scratch file final RandomAccess buffer = scratchFile.createBuffer(); - DecodeResult result = filters.get(i).decode(input, new RandomAccessOutputStream(buffer), parameters, i); + DecodeResult result = filters.get(i).decode(input, new RandomAccessOutputStream(buffer), parameters, i, options); results.add(result); input = new RandomAccessInputStream(buffer) { @@ -81,7 +89,7 @@ public final class COSInputStream extends FilterInputStream { // in-memory ByteArrayOutputStream output = new ByteArrayOutputStream(); - DecodeResult result = filters.get(i).decode(input, output, parameters, i); + DecodeResult result = filters.get(i).decode(input, output, parameters, i, options); results.add(result); input = new ByteArrayInputStream(output.toByteArray()); } @@ -90,6 +98,46 @@ public final class COSInputStream extends FilterInputStream return new COSInputStream(input, results); } + public static DecodeResult decode(List<Filter> filters, COSDictionary parameters, InputStream in, + ScratchFile scratchFile) throws IOException { + DecodeResult result = DecodeResult.DEFAULT; + InputStream input = in; + if (filters.isEmpty()) + { + input = in; + } + else + { + // apply filters + for (int i = 0; i < filters.size(); i++) + { + if (scratchFile != null) + { + // scratch file + final RandomAccess buffer = scratchFile.createBuffer(); + result = filters.get(i).decode(input, new RandomAccessOutputStream(buffer), parameters, i, DecodeOptions.METADATA_ONLY); + input = new RandomAccessInputStream(buffer) + { + @Override + public void close() throws IOException + { + buffer.close(); + } + }; + } + else + { + // in-memory + ByteArrayOutputStream output = new ByteArrayOutputStream(); + result = filters.get(i).decode(input, output, parameters, i, DecodeOptions.METADATA_ONLY); + input = new ByteArrayInputStream(output.toByteArray()); + } + } + } + return result; + + } + private final List<DecodeResult> decodeResults; /** diff --git a/pdfbox/src/main/java/org/apache/pdfbox/cos/COSStream.java b/pdfbox/src/main/java/org/apache/pdfbox/cos/COSStream.java index c3f3ddb5a..a8c8e22c8 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/cos/COSStream.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/cos/COSStream.java @@ -26,6 +26,8 @@ import java.util.ArrayList; import java.util.List; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; +import org.apache.pdfbox.filter.DecodeOptions; +import org.apache.pdfbox.filter.DecodeResult; import org.apache.pdfbox.filter.Filter; import org.apache.pdfbox.filter.FilterFactory; import org.apache.pdfbox.io.IOUtils; @@ -159,6 +161,22 @@ public class COSStream extends COSDictionary implements Closeable */ public COSInputStream createInputStream() throws IOException { + return createInputStream(DecodeOptions.DEFAULT); + } + + public COSInputStream createInputStream(DecodeOptions options) throws IOException + { + checkClosed(); + if (isWriting) + { + throw new IllegalStateException("Cannot read while there is an open stream writer"); + } + ensureRandomAccessExists(true); + InputStream input = new RandomAccessInputStream(randomAccess); + return COSInputStream.create(getFilterList(), this, input, scratchFile, options); + } + + public DecodeResult decode() throws IOException { checkClosed(); if (isWriting) { @@ -166,7 +184,7 @@ public class COSStream extends COSDictionary implements Closeable } ensureRandomAccessExists(true); InputStream input = new RandomAccessInputStream(randomAccess); - return COSInputStream.create(getFilterList(), this, input, scratchFile); + return COSInputStream.decode(getFilterList(), this, input, scratchFile); } /** diff --git a/pdfbox/src/main/java/org/apache/pdfbox/filter/DCTFilter.java b/pdfbox/src/main/java/org/apache/pdfbox/filter/DCTFilter.java index eff70a428..efa70cb1f 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/filter/DCTFilter.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/filter/DCTFilter.java @@ -26,6 +26,7 @@ import java.io.OutputStream; import javax.imageio.IIOException; import javax.imageio.ImageIO; +import javax.imageio.ImageReadParam; import javax.imageio.ImageReader; import javax.imageio.metadata.IIOMetadata; import javax.imageio.metadata.IIOMetadataNode; @@ -51,10 +52,15 @@ final class DCTFilter extends Filter private static final String ADOBE = "Adobe"; @Override - public DecodeResult decode(InputStream encoded, OutputStream decoded, - COSDictionary parameters, int index) throws IOException + public DecodeResult decode(InputStream encoded, OutputStream decoded, COSDictionary + parameters, int index, DecodeOptions options) throws IOException { - ImageReader reader = findImageReader("JPEG", "a suitable JAI I/O image filter is not installed"); + if (options.isMetadataOnly()) + { + return new DecodeResult(parameters); + } + ImageReader reader = findImageReader("JPEG", "a suitable JAI I/O image filter is not " + + "installed"); try (ImageInputStream iis = ImageIO.createImageInputStream(encoded)) { @@ -63,9 +69,15 @@ final class DCTFilter extends Filter { iis.seek(0); } - + reader.setInput(iis); - + ImageReadParam irp = reader.getDefaultReadParam(); + irp.setSourceSubsampling(options.getSubsamplingX(), options.getSubsamplingY(), + options.getSubsamplingOffsetX(), options.getSubsamplingOffsetY()); + irp.setSourceRegion(options.getSourceRegion()); + options.setHonored(true); + + String numChannels = getNumChannels(reader); // get the raster using horrible JAI workarounds @@ -73,29 +85,29 @@ final class DCTFilter extends Filter Raster raster; // Strategy: use read() for RGB or "can't get metadata" - // use readRaster() for CMYK and gray and as fallback if read() fails + // use readRaster() for CMYK and gray and as fallback if read() fails // after "can't get metadata" because "no meta" file was CMYK if ("3".equals(numChannels) || numChannels.isEmpty()) { try { - // I'd like to use ImageReader#readRaster but it is buggy and can't read RGB correctly - BufferedImage image = reader.read(0); + // I'd like to use ImageReader#readRaster but it is buggy and can't read RGB + // correctly + BufferedImage image = reader.read(0, irp); raster = image.getRaster(); - } - catch (IIOException e) + } catch (IIOException e) { // JAI can't read CMYK JPEGs using ImageReader#read or ImageIO.read but // fortunately ImageReader#readRaster isn't buggy when reading 4-channel files - LOG.debug("Couldn't read use read() for RGB image - using readRaster() as fallback", e); - raster = reader.readRaster(0, null); + LOG.debug("Couldn't read use read() for RGB image - using readRaster() as " + + "fallback", e); + raster = reader.readRaster(0, irp); } - } - else + } else { // JAI can't read CMYK JPEGs using ImageReader#read or ImageIO.read but // fortunately ImageReader#readRaster isn't buggy when reading 4-channel files - raster = reader.readRaster(0, null); + raster = reader.readRaster(0, irp); } // special handling for 4-component images @@ -106,11 +118,11 @@ final class DCTFilter extends Filter try { transform = getAdobeTransform(reader.getImageMetadata(0)); - } - catch (IIOException | NegativeArraySizeException e) + } catch (IIOException | NegativeArraySizeException e) { // we really tried asking nicely, now we're using brute force. - LOG.debug("Couldn't read usíng getAdobeTransform() - using getAdobeTransformByBruteForce() as fallback", e); + LOG.debug("Couldn't read usíng getAdobeTransform() - using " + + "getAdobeTransformByBruteForce() as fallback", e); transform = getAdobeTransformByBruteForce(iis); } int colorTransform = transform != null ? transform : 0; @@ -130,28 +142,33 @@ final class DCTFilter extends Filter default: throw new IllegalArgumentException("Unknown colorTransform"); } - } - else if (raster.getNumBands() == 3) + } else if (raster.getNumBands() == 3) { // BGR to RGB raster = fromBGRtoRGB(raster); } - DataBufferByte dataBuffer = (DataBufferByte)raster.getDataBuffer(); + DataBufferByte dataBuffer = (DataBufferByte) raster.getDataBuffer(); decoded.write(dataBuffer.getData()); - } - finally + } finally { reader.dispose(); } return new DecodeResult(parameters); } + @Override + public DecodeResult decode(InputStream encoded, OutputStream decoded, + COSDictionary parameters, int index) throws IOException + { + return decode(encoded, decoded, parameters, index, DecodeOptions.DEFAULT); + } + // reads the APP14 Adobe transform tag and returns its value, or 0 if unknown private Integer getAdobeTransform(IIOMetadata metadata) { - Element tree = (Element)metadata.getAsTree("javax_imageio_jpeg_image_1.0"); - Element markerSequence = (Element)tree.getElementsByTagName("markerSequence").item(0); + Element tree = (Element) metadata.getAsTree("javax_imageio_jpeg_image_1.0"); + Element markerSequence = (Element) tree.getElementsByTagName("markerSequence").item(0); NodeList app14AdobeNodeList = markerSequence.getElementsByTagName("app14Adobe"); if (app14AdobeNodeList != null && app14AdobeNodeList.getLength() > 0) { @@ -160,7 +177,7 @@ final class DCTFilter extends Filter } return 0; } - + // See in https://github.com/haraldk/TwelveMonkeys // com.twelvemonkeys.imageio.plugins.jpeg.AdobeDCT class for structure of APP14 segment private int getAdobeTransformByBruteForce(ImageInputStream iis) throws IOException @@ -196,8 +213,7 @@ final class DCTFilter extends Filter return app14[POS_TRANSFORM]; } } - } - else + } else { a = 0; } @@ -239,7 +255,7 @@ final class DCTFilter extends Filter value[0] = cyan; value[1] = magenta; value[2] = yellow; - value[3] = (int)K; + value[3] = (int) K; writableRaster.setPixel(x, y, value); } } @@ -264,9 +280,10 @@ final class DCTFilter extends Filter float K = value[3]; // YCbCr to RGB, see http://www.equasys.de/colorconversion.html - int r = clamp( (1.164f * (Y-16)) + (1.596f * (Cr - 128)) ); - int g = clamp( (1.164f * (Y-16)) + (-0.392f * (Cb-128)) + (-0.813f * (Cr-128))); - int b = clamp( (1.164f * (Y-16)) + (2.017f * (Cb-128))); + int r = clamp((1.164f * (Y - 16)) + (1.596f * (Cr - 128))); + int g = clamp((1.164f * (Y - 16)) + (-0.392f * (Cb - 128)) + (-0.813f * (Cr - + 128))); + int b = clamp((1.164f * (Y - 16)) + (2.017f * (Cb - 128))); // naive RGB to CMYK int cyan = 255 - r; @@ -277,7 +294,7 @@ final class DCTFilter extends Filter value[0] = cyan; value[1] = magenta; value[2] = yellow; - value[3] = (int)K; + value[3] = (int) K; writableRaster.setPixel(x, y, value); } } @@ -307,8 +324,9 @@ final class DCTFilter extends Filter } return writableRaster; } - - // returns the number of channels as a string, or an empty string if there is an error getting the meta data + + // returns the number of channels as a string, or an empty string if there is an error + // getting the meta data private String getNumChannels(ImageReader reader) { try @@ -318,25 +336,26 @@ final class DCTFilter extends Filter { return ""; } - IIOMetadataNode metaTree = (IIOMetadataNode) imageMetadata.getAsTree("javax_imageio_1.0"); - Element numChannelsItem = (Element) metaTree.getElementsByTagName("NumChannels").item(0); + IIOMetadataNode metaTree = (IIOMetadataNode) imageMetadata.getAsTree + ("javax_imageio_1.0"); + Element numChannelsItem = (Element) metaTree.getElementsByTagName("NumChannels").item + (0); if (numChannelsItem == null) { return ""; } return numChannelsItem.getAttribute("value"); - } - catch (IOException | NegativeArraySizeException e) + } catch (IOException | NegativeArraySizeException e) { LOG.debug("Couldn't read metadata - returning empty string", e); return ""; } - } + } // clamps value to 0-255 range private int clamp(float value) { - return (int)((value < 0) ? 0 : ((value > 255) ? 255 : value)); + return (int) ((value < 0) ? 0 : ((value > 255) ? 255 : value)); } @Override diff --git a/pdfbox/src/main/java/org/apache/pdfbox/filter/Filter.java b/pdfbox/src/main/java/org/apache/pdfbox/filter/Filter.java index 4fcaf43c6..0b06a305b 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/filter/Filter.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/filter/Filter.java @@ -59,26 +59,35 @@ public abstract class Filter /** * Decodes data, producing the original non-encoded data. - * @param encoded the encoded byte stream - * @param decoded the stream where decoded data will be written + * + * @param encoded the encoded byte stream + * @param decoded the stream where decoded data will be written * @param parameters the parameters used for decoding - * @param index the index to the filter being decoded + * @param index the index to the filter being decoded * @return repaired parameters dictionary, or the original parameters dictionary * @throws IOException if the stream cannot be decoded */ - public abstract DecodeResult decode(InputStream encoded, OutputStream decoded, COSDictionary parameters, - int index) throws IOException; + public abstract DecodeResult decode(InputStream encoded, OutputStream decoded, COSDictionary + parameters, + int index) throws IOException; + + public DecodeResult decode(InputStream encoded, OutputStream decoded, COSDictionary parameters, + int index, DecodeOptions options) throws IOException + { + return decode(encoded, decoded, parameters, index); + } /** * Encodes data. - * @param input the byte stream to encode - * @param encoded the stream where encoded data will be written + * + * @param input the byte stream to encode + * @param encoded the stream where encoded data will be written * @param parameters the parameters used for encoding - * @param index the index to the filter being encoded + * @param index the index to the filter being encoded * @throws IOException if the stream cannot be encoded */ public final void encode(InputStream input, OutputStream encoded, COSDictionary parameters, - int index) throws IOException + int index) throws IOException { encode(input, encoded, parameters.asUnmodifiableDictionary()); } @@ -96,26 +105,25 @@ public abstract class Filter if (filter instanceof COSName && obj instanceof COSDictionary) { // PDFBOX-3932: The PDF specification requires "If there is only one filter and that - // filter has parameters, DecodeParms shall be set to the filter’s parameter dictionary" + // filter has parameters, DecodeParms shall be set to the filter’s parameter + // dictionary" // but tests show that Adobe means "one filter name object". - return (COSDictionary)obj; - } - else if (filter instanceof COSArray && obj instanceof COSArray) + return (COSDictionary) obj; + } else if (filter instanceof COSArray && obj instanceof COSArray) { - COSArray array = (COSArray)obj; + COSArray array = (COSArray) obj; if (index < array.size()) { COSBase objAtIndex = array.getObject(index); if (objAtIndex instanceof COSDictionary) { - return (COSDictionary)array.getObject(index); + return (COSDictionary) array.getObject(index); } } - } - else if (obj != null && !(filter instanceof COSArray || obj instanceof COSArray)) + } else if (obj != null && !(filter instanceof COSArray || obj instanceof COSArray)) { LOG.error("Expected DecodeParams to be an Array or Dictionary but found " + - obj.getClass().getName()); + obj.getClass().getName()); } return new COSDictionary(); } @@ -128,7 +136,8 @@ public abstract class Filter * @return The image reader for the format. * @throws MissingImageReaderException if no image reader is found. */ - protected static ImageReader findImageReader(String formatName, String errorCause) throws MissingImageReaderException + protected static ImageReader findImageReader(String formatName, String errorCause) throws + MissingImageReaderException { Iterator<ImageReader> readers = ImageIO.getImageReadersByFormatName(formatName); ImageReader reader = null; @@ -142,7 +151,8 @@ public abstract class Filter } if (reader == null) { - throw new MissingImageReaderException("Cannot read " + formatName + " image: " + errorCause); + throw new MissingImageReaderException("Cannot read " + formatName + " image: " + + errorCause); } return reader; } diff --git a/pdfbox/src/main/java/org/apache/pdfbox/filter/FlateFilter.java b/pdfbox/src/main/java/org/apache/pdfbox/filter/FlateFilter.java index 341413385..879b814fd 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/filter/FlateFilter.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/filter/FlateFilter.java @@ -25,6 +25,7 @@ import java.util.zip.DataFormatException; import java.util.zip.Deflater; import java.util.zip.DeflaterOutputStream; import java.util.zip.Inflater; + import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.pdfbox.cos.COSDictionary; @@ -43,9 +44,13 @@ final class FlateFilter extends Filter private static final int BUFFER_SIZE = 16348; @Override - public DecodeResult decode(InputStream encoded, OutputStream decoded, - COSDictionary parameters, int index) throws IOException + public DecodeResult decode(InputStream encoded, OutputStream decoded, COSDictionary + parameters, int index, DecodeOptions options) throws IOException { + if (options.isMetadataOnly()) + { + return new DecodeResult(parameters); + } final COSDictionary decodeParams = getDecodeParams(parameters, index); int predictor = decodeParams.getInt(COSName.PREDICTOR); @@ -63,13 +68,11 @@ final class FlateFilter extends Filter decoded.flush(); baos.reset(); bais.reset(); - } - else + } else { decompress(encoded, decoded); } - } - catch (DataFormatException e) + } catch (DataFormatException e) { // if the stream is corrupt a DataFormatException may occur LOG.error("FlateFilter: stop reading corrupt stream due to a DataFormatException"); @@ -80,60 +83,67 @@ final class FlateFilter extends Filter return new DecodeResult(parameters); } + @Override + public DecodeResult decode(InputStream encoded, OutputStream decoded, + COSDictionary parameters, int index) throws IOException + { + return decode(encoded, decoded, parameters, index, DecodeOptions.DEFAULT); + } + // Use Inflater instead of InflateInputStream to avoid an EOFException due to a probably // missing Z_STREAM_END, see PDFBOX-1232 for details - private void decompress(InputStream in, OutputStream out) throws IOException, DataFormatException - { + private void decompress(InputStream in, OutputStream out) throws IOException, + DataFormatException + { byte[] buf = new byte[2048]; // skip zlib header - in.read(buf,0,2); - int read = in.read(buf); - if (read > 0) - { + in.read(buf, 0, 2); + int read = in.read(buf); + if (read > 0) + { // use nowrap mode to bypass zlib-header and checksum to avoid a DataFormatException - Inflater inflater = new Inflater(true); - inflater.setInput(buf,0,read); - byte[] res = new byte[1024]; + Inflater inflater = new Inflater(true); + inflater.setInput(buf, 0, read); + byte[] res = new byte[1024]; boolean dataWritten = false; - while (true) - { + while (true) + { int resRead = 0; try { resRead = inflater.inflate(res); - } - catch(DataFormatException exception) + } catch (DataFormatException exception) { if (dataWritten) { // some data could be read -> don't throw an exception - LOG.warn("FlateFilter: premature end of stream due to a DataFormatException"); + LOG.warn("FlateFilter: premature end of stream due to a " + + "DataFormatException"); break; - } - else + } else { // nothing could be read -> re-throw exception throw exception; } } - if (resRead != 0) - { - out.write(res,0,resRead); + if (resRead != 0) + { + out.write(res, 0, resRead); dataWritten = true; - continue; - } - if (inflater.finished() || inflater.needsDictionary() || in.available() == 0) + continue; + } + if (inflater.finished() || inflater.needsDictionary() || in.available() == 0) { break; - } - read = in.read(buf); - inflater.setInput(buf,0,read); + } + read = in.read(buf); + inflater.setInput(buf, 0, read); } inflater.end(); } out.flush(); } - + @Override protected void encode(InputStream input, OutputStream encoded, COSDictionary parameters) throws IOException @@ -141,22 +151,22 @@ final class FlateFilter extends Filter int compressionLevel = Deflater.DEFAULT_COMPRESSION; try { - compressionLevel = Integer.parseInt(System.getProperty(Filter.SYSPROP_DEFLATELEVEL, "-1")); - } - catch (NumberFormatException ex) + compressionLevel = Integer.parseInt(System.getProperty(Filter.SYSPROP_DEFLATELEVEL, + "-1")); + } catch (NumberFormatException ex) { LOG.warn(ex.getMessage(), ex); } compressionLevel = Math.max(-1, Math.min(Deflater.BEST_COMPRESSION, compressionLevel)); Deflater deflater = new Deflater(compressionLevel); - try (DeflaterOutputStream out = new DeflaterOutputStream(encoded,deflater)) + try (DeflaterOutputStream out = new DeflaterOutputStream(encoded, deflater)) { int amountRead; int mayRead = input.available(); if (mayRead > 0) { - byte[] buffer = new byte[Math.min(mayRead,BUFFER_SIZE)]; - while ((amountRead = input.read(buffer, 0, Math.min(mayRead,BUFFER_SIZE))) != -1) + byte[] buffer = new byte[Math.min(mayRead, BUFFER_SIZE)]; + while ((amountRead = input.read(buffer, 0, Math.min(mayRead, BUFFER_SIZE))) != -1) { out.write(buffer, 0, amountRead); } diff --git a/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java b/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java index 756d47237..7f0fb4d8c 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java @@ -25,6 +25,7 @@ import java.io.InputStream; import java.io.OutputStream; import java.io.SequenceInputStream; import javax.imageio.ImageIO; +import javax.imageio.ImageReadParam; import javax.imageio.ImageReader; import javax.imageio.stream.ImageInputStream; import org.apache.commons.logging.Log; @@ -61,8 +62,8 @@ final class JBIG2Filter extends Filter } @Override - public DecodeResult decode(InputStream encoded, OutputStream decoded, - COSDictionary parameters, int index) throws IOException + public DecodeResult decode(InputStream encoded, OutputStream decoded, COSDictionary + parameters, int index, DecodeOptions options) throws IOException { ImageReader reader = findImageReader("JBIG2", "jbig2-imageio is not installed"); if (reader.getClass().getName().contains("levigo")) @@ -73,6 +74,17 @@ final class JBIG2Filter extends Filter int bits = parameters.getInt(COSName.BITS_PER_COMPONENT, 1); COSDictionary params = getDecodeParams(parameters, index); + if (options.isMetadataOnly()) + { + return new DecodeResult(parameters); + } + + ImageReadParam irp = reader.getDefaultReadParam(); + irp.setSourceSubsampling(options.getSubsamplingX(), options.getSubsamplingY(), + options.getSubsamplingOffsetX(), options.getSubsamplingOffsetY()); + irp.setSourceRegion(options.getSourceRegion()); + options.setHonored(true); + InputStream source = encoded; if (params != null) { @@ -90,9 +102,8 @@ final class JBIG2Filter extends Filter BufferedImage image; try { - image = reader.read(0, reader.getDefaultReadParam()); - } - catch (Exception e) + image = reader.read(0, irp); + } catch (Exception e) { // wrap and rethrow any exceptions throw new IOException("Could not read JBIG2 image", e); @@ -128,9 +139,17 @@ final class JBIG2Filter extends Filter { reader.dispose(); } + return new DecodeResult(parameters); } + @Override + public DecodeResult decode(InputStream encoded, OutputStream decoded, + COSDictionary parameters, int index) throws IOException + { + return decode(encoded, decoded, parameters, index, DecodeOptions.DEFAULT); + } + @Override protected void encode(InputStream input, OutputStream encoded, COSDictionary parameters) throws IOException diff --git a/pdfbox/src/main/java/org/apache/pdfbox/filter/JPXFilter.java b/pdfbox/src/main/java/org/apache/pdfbox/filter/JPXFilter.java index c9f91cfbe..0a706b0c3 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/filter/JPXFilter.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/filter/JPXFilter.java @@ -24,9 +24,11 @@ import java.awt.image.WritableRaster; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; +import javax.imageio.ImageReadParam; import javax.imageio.ImageReader; import javax.imageio.stream.ImageInputStream; import javax.imageio.stream.MemoryCacheImageInputStream; + import org.apache.pdfbox.cos.COSDictionary; import org.apache.pdfbox.cos.COSName; import org.apache.pdfbox.pdmodel.graphics.color.PDJPXColorSpace; @@ -34,12 +36,12 @@ import org.apache.pdfbox.pdmodel.graphics.color.PDJPXColorSpace; /** * Decompress data encoded using the wavelet-based JPEG 2000 standard, * reproducing the original data. - * + * <p> * Requires the Java Advanced Imaging (JAI) Image I/O Tools to be installed from java.net, see * <a href="http://download.java.net/media/jai-imageio/builds/release/1.1/">jai-imageio</a>. * Alternatively you can build from the source available in the * <a href="https://java.net/projects/jai-imageio-core/">jai-imageio-core svn repo</a>. - * + * <p> * Mac OS X users should download the tar.gz file for linux and unpack it to obtain the * required jar files. The .so file can be safely ignored. * @@ -49,12 +51,17 @@ import org.apache.pdfbox.pdmodel.graphics.color.PDJPXColorSpace; public final class JPXFilter extends Filter { @Override - public DecodeResult decode(InputStream encoded, OutputStream decoded, - COSDictionary parameters, int index) throws IOException + public DecodeResult decode(InputStream encoded, OutputStream decoded, COSDictionary + parameters, int index, DecodeOptions options) throws IOException { DecodeResult result = new DecodeResult(new COSDictionary()); result.getParameters().addAll(parameters); - BufferedImage image = readJPX(encoded, result); + BufferedImage image = readJPX(encoded, options, result); + + if (options.isMetadataOnly()) + { + return result; + } WritableRaster raster = image.getRaster(); switch (raster.getDataBuffer().getDataType()) @@ -74,25 +81,39 @@ public final class JPXFilter extends Filter return result; default: - throw new IOException("Data type " + raster.getDataBuffer().getDataType() + " not implemented"); - } + throw new IOException("Data type " + raster.getDataBuffer().getDataType() + " not" + + " implemented"); + } + } + + @Override + public DecodeResult decode(InputStream encoded, OutputStream decoded, + COSDictionary parameters, int index) throws IOException + { + return decode(encoded, decoded, parameters, index, DecodeOptions.DEFAULT); } // try to read using JAI Image I/O - private BufferedImage readJPX(InputStream input, DecodeResult result) throws IOException + private BufferedImage readJPX(InputStream input, DecodeOptions options, DecodeResult result) + throws IOException { - ImageReader reader = findImageReader("JPEG2000", "Java Advanced Imaging (JAI) Image I/O Tools are not installed"); + ImageReader reader = findImageReader("JPEG2000", "Java Advanced Imaging (JAI) Image I/O " + + "Tools are not installed"); // PDFBOX-4121: ImageIO.createImageInputStream() is much slower try (ImageInputStream iis = new MemoryCacheImageInputStream(input)) { reader.setInput(iis, true, true); + ImageReadParam irp = reader.getDefaultReadParam(); + irp.setSourceRegion(options.getSourceRegion()); + irp.setSourceSubsampling(options.getSubsamplingX(), options.getSubsamplingY(), + options.getSubsamplingOffsetX(), options.getSubsamplingOffsetY()); + options.setHonored(true); BufferedImage image; try { - image = reader.read(0); - } - catch (Exception e) + image = reader.read(0, irp); + } catch (Exception e) { // wrap and rethrow any exceptions throw new IOException("Could not read JPEG 2000 (JPX) image", e); @@ -114,8 +135,8 @@ public final class JPXFilter extends Filter } // override dimensions, see PDFBOX-1735 - parameters.setInt(COSName.WIDTH, image.getWidth()); - parameters.setInt(COSName.HEIGHT, image.getHeight()); + parameters.setInt(COSName.WIDTH, reader.getWidth(0)); + parameters.setInt(COSName.HEIGHT, reader.getHeight(0)); // extract embedded color space if (!parameters.containsKey(COSName.COLORSPACE)) @@ -124,8 +145,7 @@ public final class JPXFilter extends Filter } return image; - } - finally + } finally { reader.dispose(); } diff --git a/pdfbox/src/main/java/org/apache/pdfbox/filter/LZWFilter.java b/pdfbox/src/main/java/org/apache/pdfbox/filter/LZWFilter.java index a67d1c67b..8443e7ffc 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/filter/LZWFilter.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/filter/LZWFilter.java @@ -34,7 +34,6 @@ import org.apache.pdfbox.cos.COSDictionary; import org.apache.pdfbox.cos.COSName; /** - * * This is the filter used for the LZWDecode filter. * * @author Ben Litchfield @@ -56,17 +55,19 @@ public class LZWFilter extends Filter * The LZW end of data code. */ public static final long EOD = 257; - + //BEWARE: codeTable must be local to each method, because there is only // one instance of each filter - /** - * {@inheritDoc} - */ + @Override - public DecodeResult decode(InputStream encoded, OutputStream decoded, - COSDictionary parameters, int index) throws IOException + public DecodeResult decode(InputStream encoded, OutputStream decoded, COSDictionary + parameters, int index, DecodeOptions options) throws IOException { + if (options.isMetadataOnly()) + { + return new DecodeResult(parameters); + } COSDictionary decodeParams = getDecodeParams(parameters, index); int predictor = decodeParams.getInt(COSName.PREDICTOR); int earlyChange = decodeParams.getInt(COSName.EARLY_CHANGE, 1); @@ -88,15 +89,25 @@ public class LZWFilter extends Filter decoded.flush(); baos.reset(); bais.reset(); - } - else + } else { doLZWDecode(encoded, decoded, earlyChange); } return new DecodeResult(parameters); } - private void doLZWDecode(InputStream encoded, OutputStream decoded, int earlyChange) throws IOException + /** + * {@inheritDoc} + */ + @Override + public DecodeResult decode(InputStream encoded, OutputStream decoded, + COSDictionary parameters, int index) throws IOException + { + return decode(encoded, decoded, parameters, index, DecodeOptions.DEFAULT); + } + + private void doLZWDecode(InputStream encoded, OutputStream decoded, int earlyChange) throws + IOException { List<byte[]> codeTable = new ArrayList<>(); int chunk = 9; @@ -113,8 +124,7 @@ public class LZWFilter extends Filter chunk = 9; codeTable = createCodeTable(); prevCommand = -1; - } - else + } else { if (nextCommand < codeTable.size()) { @@ -129,8 +139,7 @@ public class LZWFilter extends Filter newData[data.length] = firstByte; codeTable.add(newData); } - } - else + } else { checkIndexBounds(codeTable, prevCommand, in); byte[] data = codeTable.get((int) prevCommand); @@ -139,20 +148,20 @@ public class LZWFilter extends Filter decoded.write(newData); codeTable.add(newData); } - + chunk = calculateChunk(codeTable.size(), earlyChange); prevCommand = nextCommand; } } - } - catch (EOFException ex) + } catch (EOFException ex) { LOG.warn("Premature EOF in LZW stream, EOD code missing", ex); } decoded.flush(); } - private void checkIndexBounds(List<byte[]> codeTable, long index, MemoryCacheImageInputStream in) + private void checkIndexBounds(List<byte[]> codeTable, long index, MemoryCacheImageInputStream + in) throws IOException { if (index < 0) @@ -189,10 +198,9 @@ public class LZWFilter extends Filter byte by = (byte) r; if (inputPattern == null) { - inputPattern = new byte[] { by }; + inputPattern = new byte[]{by}; foundCode = by & 0xff; - } - else + } else { inputPattern = Arrays.copyOf(inputPattern, inputPattern.length + 1); inputPattern[inputPattern.length - 1] = by; @@ -204,18 +212,17 @@ public class LZWFilter extends Filter out.writeBits(foundCode, chunk); // create new table entry codeTable.add(inputPattern); - + if (codeTable.size() == 4096) { // code table is full out.writeBits(CLEAR_TABLE, chunk); codeTable = createCodeTable(); } - - inputPattern = new byte[] { by }; + + inputPattern = new byte[]{by}; foundCode = by & 0xff; - } - else + } else { foundCode = newFoundCode; } @@ -226,19 +233,19 @@ public class LZWFilter extends Filter chunk = calculateChunk(codeTable.size() - 1, 1); out.writeBits(foundCode, chunk); } - + // PPDFBOX-1977: the decoder wouldn't know that the encoder would output // an EOD as code, so he would have increased his own code table and // possibly adjusted the chunk. Therefore, the encoder must behave as // if the code table had just grown and thus it must be checked it is // needed to adjust the chunk, based on an increased table size parameter chunk = calculateChunk(codeTable.size(), 1); - + out.writeBits(EOD, chunk); - + // pad with 0 out.writeBits(0, 7); - + // must do or file will be empty :-( out.flush(); } @@ -248,7 +255,7 @@ public class LZWFilter extends Filter * Find the longest matching pattern in the code table. * * @param codeTable The LZW code table. - * @param pattern The pattern to be searched for. + * @param pattern The pattern to be searched for. * @return The index of the longest matching pattern or -1 if nothing is * found. */ @@ -264,16 +271,16 @@ public class LZWFilter extends Filter if (foundCode != -1) { // we already found pattern with size > 1 - return foundCode; - } - else if (pattern.length > 1) + return foundCode; + } else if (pattern.length > 1) { // we won't find anything here anyway return -1; } } byte[] tryPattern = codeTable.get(i); - if ((foundCode != -1 || tryPattern.length > foundLen) && Arrays.equals(tryPattern, pattern)) + if ((foundCode != -1 || tryPattern.length > foundLen) && Arrays.equals(tryPattern, + pattern)) { foundCode = i; foundLen = tryPattern.length; @@ -291,7 +298,7 @@ public class LZWFilter extends Filter List<byte[]> codeTable = new ArrayList<>(4096); for (int i = 0; i < 256; ++i) { - codeTable.add(new byte[] { (byte) (i & 0xFF) }); + codeTable.add(new byte[]{(byte) (i & 0xFF)}); } codeTable.add(null); // 256 EOD codeTable.add(null); // 257 CLEAR_TABLE @@ -301,9 +308,8 @@ public class LZWFilter extends Filter /** * Calculate the appropriate chunk size * - * @param tabSize the size of the code table + * @param tabSize the size of the code table * @param earlyChange 0 or 1 for early chunk increase - * * @return a value between 9 and 12 */ private int calculateChunk(int tabSize, int earlyChange) diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/common/PDStream.java b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/common/PDStream.java index 8f520f981..30a430ae8 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/common/PDStream.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/common/PDStream.java @@ -32,6 +32,8 @@ import org.apache.pdfbox.cos.COSInputStream; import org.apache.pdfbox.cos.COSName; import org.apache.pdfbox.cos.COSNull; import org.apache.pdfbox.cos.COSStream; +import org.apache.pdfbox.filter.DecodeOptions; +import org.apache.pdfbox.filter.DecodeResult; import org.apache.pdfbox.filter.Filter; import org.apache.pdfbox.filter.FilterFactory; import org.apache.pdfbox.io.IOUtils; @@ -229,6 +231,15 @@ public class PDStream implements COSObjectable return stream.createInputStream(); } + public COSInputStream createInputStream(DecodeOptions options) throws IOException + { + return stream.createInputStream(options); + } + + public DecodeResult decode() throws IOException { + return stream.decode(); + } + /** * This will get a stream with some filters applied but not others. This is * useful when doing images, ie filters = [flate,dct], we want to remove diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImage.java b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImage.java index 891544beb..cf154808b 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImage.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImage.java @@ -16,12 +16,14 @@ */ package org.apache.pdfbox.pdmodel.graphics.image; -import java.awt.Paint; +import java.awt.*; import java.awt.image.BufferedImage; import java.io.IOException; import java.io.InputStream; import java.util.List; + import org.apache.pdfbox.cos.COSArray; +import org.apache.pdfbox.filter.DecodeOptions; import org.apache.pdfbox.pdmodel.common.COSObjectable; import org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace; @@ -34,24 +36,29 @@ public interface PDImage extends COSObjectable { /** * Returns the content of this image as an AWT buffered image with an (A)RGB color space. - * The size of the returned image is the larger of the size of the image itself or its mask. + * The size of the returned image is the larger of the size of the image itself or its mask. + * * @return content of this image as a buffered image. * @throws IOException */ BufferedImage getImage() throws IOException; + BufferedImage getImage(Rectangle region, int subsample) throws IOException; + /** * Returns an ARGB image filled with the given paint and using this image as a mask. + * * @param paint the paint to fill the visible portions of the image with * @return a masked image filled with the given paint - * @throws IOException if the image cannot be read + * @throws IOException if the image cannot be read * @throws IllegalStateException if the image is not a stencil. */ BufferedImage getStencilImage(Paint paint) throws IOException; - + /** * Returns an InputStream containing the image data, irrespective of whether this is an * inline image or an image XObject. + * * @return Decoded stream * @throws IOException if the data could not be read. */ @@ -60,12 +67,15 @@ public interface PDImage extends COSObjectable /** * Returns an InputStream containing the image data, irrespective of whether this is an * inline image or an image XObject. The given filters will not be decoded. + * * @param stopFilters A list of filters to stop decoding at. * @return Decoded stream * @throws IOException if the data could not be read. */ InputStream createInputStream(List<String> stopFilters) throws IOException; + public InputStream createInputStream(DecodeOptions options) throws IOException; + /** * Returns true if the image has no data. */ @@ -79,6 +89,7 @@ public interface PDImage extends COSObjectable /** * Sets whether or not the image is a stencil. * This corresponds to the {@code ImageMask} entry in the image stream's dictionary. + * * @param isStencil True to make the image a stencil. */ void setStencil(boolean isStencil); @@ -90,18 +101,21 @@ public interface PDImage extends COSObjectable /** * Set the number of bits per component. + * * @param bitsPerComponent The number of bits per component. */ void setBitsPerComponent(int bitsPerComponent); /** * Returns the image's color space. + * * @throws IOException If there is an error getting the color space. */ PDColorSpace getColorSpace() throws IOException; /** * Sets the color space for this image. + * * @param colorSpace The color space for this image. */ void setColorSpace(PDColorSpace colorSpace); @@ -113,6 +127,7 @@ public interface PDImage extends COSObjectable /** * Sets the height of the image. + * * @param height The height of the image. */ void setHeight(int height); @@ -124,13 +139,15 @@ public interface PDImage extends COSObjectable /** * Sets the width of the image. + * * @param width The width of the image. */ void setWidth(int width); /** * Sets the decode array. - * @param decode the new decode array. + * + * @param decode the new decode array. */ void setDecode(COSArray decode); diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java index 1f8727364..8a5476d19 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java @@ -16,9 +16,7 @@ */ package org.apache.pdfbox.pdmodel.graphics.image; -import java.awt.Graphics2D; -import java.awt.Paint; -import java.awt.RenderingHints; +import java.awt.*; import java.awt.image.BufferedImage; import java.awt.image.WritableRaster; import java.io.BufferedInputStream; @@ -31,6 +29,7 @@ import java.io.OutputStream; import java.lang.ref.SoftReference; import java.util.List; import javax.imageio.ImageIO; + import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.pdfbox.cos.COSArray; @@ -39,6 +38,8 @@ import org.apache.pdfbox.cos.COSInputStream; import org.apache.pdfbox.cos.COSName; import org.apache.pdfbox.cos.COSObject; import org.apache.pdfbox.cos.COSStream; +import org.apache.pdfbox.filter.DecodeOptions; +import org.apache.pdfbox.filter.DecodeResult; import org.apache.pdfbox.io.IOUtils; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDResources; @@ -73,7 +74,8 @@ public final class PDImageXObject extends PDXObject implements PDImage /** * Creates an Image XObject in the given document. This constructor is for internal PDFBox use - * and is not for PDF generation. Users who want to create images should look at {@link #createFromFileByExtension(File, PDDocument) + * and is not for PDF generation. Users who want to create images should look at {@link + * #createFromFileByExtension(File, PDDocument) * }. * * @param document the current document @@ -89,18 +91,18 @@ public final class PDImageXObject extends PDXObject implements PDImage * constructor is for internal PDFBox use and is not for PDF generation. Users who want to * create images should look at {@link #createFromFileByExtension(File, PDDocument) }. * - * @param document the current document - * @param encodedStream an encoded stream of image data - * @param cosFilter the filter or a COSArray of filters - * @param width the image width - * @param height the image height + * @param document the current document + * @param encodedStream an encoded stream of image data + * @param cosFilter the filter or a COSArray of filters + * @param width the image width + * @param height the image height * @param bitsPerComponent the bits per component - * @param initColorSpace the color space + * @param initColorSpace the color space * @throws IOException if there is an error creating the XObject. */ - public PDImageXObject(PDDocument document, InputStream encodedStream, - COSBase cosFilter, int width, int height, int bitsPerComponent, - PDColorSpace initColorSpace) throws IOException + public PDImageXObject(PDDocument document, InputStream encodedStream, + COSBase cosFilter, int width, int height, int bitsPerComponent, + PDColorSpace initColorSpace) throws IOException { super(createRawStream(document, encodedStream), COSName.IMAGE); getCOSObject().setItem(COSName.FILTER, cosFilter); @@ -117,25 +119,26 @@ public final class PDImageXObject extends PDXObject implements PDImage * constructor is for internal PDFBox use and is not for PDF generation. Users who want to * create images should look at {@link #createFromFileByExtension(File, PDDocument) }. * - * @param stream the XObject stream to read + * @param stream the XObject stream to read * @param resources the current resources * @throws java.io.IOException if there is an error creating the XObject. */ public PDImageXObject(PDStream stream, PDResources resources) throws IOException { - this(stream, resources, stream.createInputStream()); + this(stream, resources, stream.decode()); } - + // repairs parameters using decode result - private PDImageXObject(PDStream stream, PDResources resources, COSInputStream input) + private PDImageXObject(PDStream stream, PDResources resources, DecodeResult decodeResult) { - super(repair(stream, input), COSName.IMAGE); + super(repair(stream, decodeResult), COSName.IMAGE); this.resources = resources; - this.colorSpace = input.getDecodeResult().getJPXColorSpace(); + this.colorSpace = decodeResult.getJPXColorSpace(); } /** * Creates a thumbnail Image XObject from the given COSBase and name. + * * @param cosStream the COS stream * @return an XObject * @throws IOException if there is an error creating the XObject. @@ -162,14 +165,15 @@ public final class PDImageXObject extends PDXObject implements PDImage } /** - * Create a PDImageXObject from an image file, see {@link #createFromFileByExtension(File, PDDocument)} for + * Create a PDImageXObject from an image file, see + * {@link #createFromFileByExtension(File, PDDocument)} for * more details. * * @param imagePath the image file path. - * @param doc the document that shall use this PDImageXObject. + * @param doc the document that shall use this PDImageXObject. * @return a PDImageXObject. * @throws IOException if there is an error when reading the file or creating the - * PDImageXObject, or if the image type is not supported. + * PDImageXObject, or if the image type is not supported. */ public static PDImageXObject createFromFile(String imagePath, PDDocument doc) throws IOException { @@ -185,13 +189,14 @@ public final class PDImageXObject extends PDXObject implements PDImage * PDImageXObject from a BufferedImage). * * @param file the image file. - * @param doc the document that shall use this PDImageXObject. + * @param doc the document that shall use this PDImageXObject. * @return a PDImageXObject. - * @throws IOException if there is an error when reading the file or creating the - * PDImageXObject. + * @throws IOException if there is an error when reading the file or creating the + * PDImageXObject. * @throws IllegalArgumentException if the image type is not supported. */ - public static PDImageXObject createFromFileByExtension(File file, PDDocument doc) throws IOException + public static PDImageXObject createFromFileByExtension(File file, PDDocument doc) throws + IOException { String name = file.getName(); int dot = file.getName().lastIndexOf('.'); @@ -228,20 +233,21 @@ public final class PDImageXObject extends PDXObject implements PDImage * PDImageXObject from a BufferedImage). * * @param file the image file. - * @param doc the document that shall use this PDImageXObject. + * @param doc the document that shall use this PDImageXObject. * @return a PDImageXObject. - * @throws IOException if there is an error when reading the file or creating the - * PDImageXObject. + * @throws IOException if there is an error when reading the file or creating the + * PDImageXObject. * @throws IllegalArgumentException if the image type is not supported. */ - public static PDImageXObject createFromFileByContent(File file, PDDocument doc) throws IOException + public static PDImageXObject createFromFileByContent(File file, PDDocument doc) throws + IOException { FileType fileType = null; - try (BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(file))) + try (BufferedInputStream bufferedInputStream = new BufferedInputStream(new + FileInputStream(file))) { fileType = FileTypeDetector.detectFileType(bufferedInputStream); - } - catch (IOException e) + } catch (IOException e) { throw new IOException("Could not determine file type: " + file.getName(), e); } @@ -261,7 +267,8 @@ public final class PDImageXObject extends PDXObject implements PDImage { return CCITTFactory.createFromFile(doc, file); } - if (fileType.equals(FileType.BMP) || fileType.equals(FileType.GIF) || fileType.equals(FileType.PNG)) + if (fileType.equals(FileType.BMP) || fileType.equals(FileType.GIF) || fileType.equals + (FileType.PNG)) { BufferedImage bim = ImageIO.read(file); return LosslessFactory.createFromImage(doc, bim); @@ -278,21 +285,21 @@ public final class PDImageXObject extends PDXObject implements PDImage * PDImageXObject from a BufferedImage). * * @param byteArray bytes from an image file. - * @param document the document that shall use this PDImageXObject. - * @param name name of image file for exception messages, can be null. + * @param document the document that shall use this PDImageXObject. + * @param name name of image file for exception messages, can be null. * @return a PDImageXObject. - * @throws IOException if there is an error when reading the file or creating the - * PDImageXObject. + * @throws IOException if there is an error when reading the file or creating the + * PDImageXObject. * @throws IllegalArgumentException if the image type is not supported. */ - public static PDImageXObject createFromByteArray(PDDocument document, byte[] byteArray, String name) throws IOException + public static PDImageXObject createFromByteArray(PDDocument document, byte[] byteArray, + String name) throws IOException { FileType fileType; try { fileType = FileTypeDetector.detectFileType(byteArray); - } - catch (IOException e) + } catch (IOException e) { throw new IOException("Could not determine file type: " + name, e); } @@ -309,7 +316,8 @@ public final class PDImageXObject extends PDXObject implements PDImage { return CCITTFactory.createFromByteArray(document, byteArray); } - if (fileType.equals(FileType.BMP) || fileType.equals(FileType.GIF) || fileType.equals(FileType.PNG)) + if (fileType.equals(FileType.BMP) || fileType.equals(FileType.GIF) || fileType.equals + (FileType.PNG)) { ByteArrayInputStream bais = new ByteArrayInputStream(byteArray); BufferedImage bim = ImageIO.read(bais); @@ -319,14 +327,15 @@ public final class PDImageXObject extends PDXObject implements PDImage } // repairs parameters using decode result - private static PDStream repair(PDStream stream, COSInputStream input) + private static PDStream repair(PDStream stream, DecodeResult decodeResult) { - stream.getCOSObject().addAll(input.getDecodeResult().getParameters()); + stream.getCOSObject().addAll(decodeResult.getParameters()); return stream; } /** * Returns the metadata associated with this XObject, or null if there is none. + * * @return the metadata associated with this object. */ public PDMetadata getMetadata() @@ -341,6 +350,7 @@ public final class PDImageXObject extends PDXObject implements PDImage /** * Sets the metadata associated with this XObject, or null if there is none. + * * @param meta the metadata associated with this object */ public void setMetadata(PDMetadata meta) @@ -350,6 +360,7 @@ public final class PDImageXObject extends PDXObject implements PDImage /** * Returns the key of this XObject in the structural parent tree. + * * @return this object's key the structural parent tree */ public int getStructParent() @@ -359,6 +370,7 @@ public final class PDImageXObject extends PDXObject implements PDImage /** * Sets the key of this XObject in the structural parent tree. + * * @param key the new key for this XObject */ public void setStructParent(int key) @@ -381,17 +393,25 @@ public final class PDImageXObject extends PDXObject implements PDImage return cached; } } + BufferedImage image = getImage(null, 1); + cachedImage = new SoftReference<>(image); + return image; + } + @Override + public BufferedImage getImage(Rectangle region, int subsample) throws IOException + { // get image as RGB - BufferedImage image = SampledImageReader.getRGBImage(this, getColorKeyMask()); + BufferedImage image = SampledImageReader.getRGBImage(this, region, subsample, + getColorKeyMask()); + // soft mask (overrides explicit mask) PDImageXObject softMask = getSoftMask(); if (softMask != null) { image = applyMask(image, softMask.getOpaqueImage(), true); - } - else + } else { // explicit mask - to be applied only if /ImageMask true PDImageXObject mask = getMask(); @@ -401,10 +421,11 @@ public final class PDImageXObject extends PDXObject implements PDImage } } - cachedImage = new SoftReference<>(image); return image; + } + /** * {@inheritDoc} * The returned images are not cached. @@ -422,6 +443,7 @@ public final class PDImageXObject extends PDXObject implements PDImage /** * Returns an RGB buffered image containing the opaque image stream without any masks applied. * If this Image XObject is a mask then the buffered image will contain the raw mask. + * * @return the image without any masks applied * @throws IOException if the image cannot be read */ @@ -447,8 +469,7 @@ public final class PDImageXObject extends PDXObject implements PDImage if (mask.getWidth() < width || mask.getHeight() < height) { mask = scaleImage(mask, width, height); - } - else if (mask.getWidth() > width || mask.getHeight() > height) + } else if (mask.getWidth() > width || mask.getHeight() > height) { width = mask.getWidth(); height = mask.getHeight(); @@ -473,13 +494,12 @@ public final class PDImageXObject extends PDXObject implements PDImage rgba[0] = rgb[0]; rgba[1] = rgb[1]; rgba[2] = rgb[2]; - + alphaPixel = alpha.getPixel(x, y, alphaPixel); if (isSoft) { rgba[3] = alphaPixel[0]; - } - else + } else { rgba[3] = 255 - alphaPixel[0]; } @@ -499,9 +519,9 @@ public final class PDImageXObject extends PDXObject implements PDImage BufferedImage image2 = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB); Graphics2D g = image2.createGraphics(); g.setRenderingHint(RenderingHints.KEY_INTERPOLATION, - RenderingHints.VALUE_INTERPOLATION_BICUBIC); + RenderingHints.VALUE_INTERPOLATION_BICUBIC); g.setRenderingHint(RenderingHints.KEY_RENDERING, - RenderingHints.VALUE_RENDER_QUALITY); + RenderingHints.VALUE_RENDER_QUALITY); g.drawImage(image, 0, 0, width, height, 0, 0, image.getWidth(), image.getHeight(), null); g.dispose(); return image2; @@ -509,6 +529,7 @@ public final class PDImageXObject extends PDXObject implements PDImage /** * Returns the Mask Image XObject associated with this image, or null if there is none. + * * @return Mask Image XObject * @throws java.io.IOException */ @@ -519,8 +540,7 @@ public final class PDImageXObject extends PDXObject implements PDImage { // color key mask, no explicit mask to return return null; - } - else + } else { COSStream cosStream = (COSStream) getCOSObject().getDictionaryObject(COSName.MASK); if (cosStream != null) @@ -534,6 +554,7 @@ public final class PDImageXObject extends PDXObject implements PDImage /** * Returns the color key mask array associated with this image, or null if there is none. + * * @return Mask Image XObject */ public COSArray getColorKeyMask() @@ -541,13 +562,14 @@ public final class PDImageXObject extends PDXObject implements PDImage COSBase mask = getCOSObject().getDictionaryObject(COSName.MASK); if (mask instanceof COSArray) { - return (COSArray)mask; + return (COSArray) mask; } return null; } /** * Returns the Soft Mask Image XObject associated with this image, or null if there is none. + * * @return the SMask Image XObject, or null. * @throws java.io.IOException */ @@ -568,8 +590,7 @@ public final class PDImageXObject extends PDXObject implements PDImage if (isStencil()) { return 1; - } - else + } else { return getCOSObject().getInt(COSName.BITS_PER_COMPONENT, COSName.BPC); } @@ -607,13 +628,11 @@ public final class PDImageXObject extends PDXObject implements PDImage { resources.getResourceCache().put(indirect, colorSpace); } - } - else if (isStencil()) + } else if (isStencil()) { // stencil mask color space must be gray, it is often missing return PDDeviceGray.INSTANCE; - } - else + } else { // an image without a color space is always broken throw new IOException("could not determine color space"); @@ -628,6 +647,12 @@ public final class PDImageXObject extends PDXObject implements PDImage return getStream().createInputStream(); } + @Override + public InputStream createInputStream(DecodeOptions options) throws IOException + { + return getStream().createInputStream(options); + } + @Override public InputStream createInputStream(List<String> stopFilters) throws IOException { @@ -713,6 +738,7 @@ public final class PDImageXObject extends PDXObject implements PDImage /** * This will get the suffix for this image type, e.g. jpg/png. + * * @return The image suffix or null if not available. */ @Override @@ -723,30 +749,24 @@ public final class PDImageXObject extends PDXObject implements PDImage if (filters == null) { return "png"; - } - else if (filters.contains(COSName.DCT_DECODE)) + } else if (filters.contains(COSName.DCT_DECODE)) { return "jpg"; - } - else if (filters.contains(COSName.JPX_DECODE)) + } else if (filters.contains(COSName.JPX_DECODE)) { return "jpx"; - } - else if (filters.contains(COSName.CCITTFAX_DECODE)) + } else if (filters.contains(COSName.CCITTFAX_DECODE)) { return "tiff"; - } - else if (filters.contains(COSName.FLATE_DECODE) + } else if (filters.contains(COSName.FLATE_DECODE) || filters.contains(COSName.LZW_DECODE) || filters.contains(COSName.RUN_LENGTH_DECODE)) { return "png"; - } - else if (filters.contains(COSName.JBIG2_DECODE)) + } else if (filters.contains(COSName.JBIG2_DECODE)) { return "jb2"; - } - else + } else { LOG.warn("getSuffix() returns null, filters: " + filters); return null; diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDInlineImage.java b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDInlineImage.java index dbdfba837..32f233dd2 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDInlineImage.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDInlineImage.java @@ -16,17 +16,19 @@ */ package org.apache.pdfbox.pdmodel.graphics.image; -import java.awt.Paint; +import java.awt.*; import java.awt.image.BufferedImage; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.InputStream; import java.util.List; + import org.apache.pdfbox.cos.COSArray; import org.apache.pdfbox.cos.COSBase; import org.apache.pdfbox.cos.COSDictionary; import org.apache.pdfbox.cos.COSName; +import org.apache.pdfbox.filter.DecodeOptions; import org.apache.pdfbox.filter.DecodeResult; import org.apache.pdfbox.filter.Filter; import org.apache.pdfbox.filter.FilterFactory; @@ -58,8 +60,8 @@ public final class PDInlineImage implements PDImage * Creates an inline image from the given parameters and data. * * @param parameters the image parameters - * @param data the image data - * @param resources the current resources + * @param data the image data + * @param resources the current resources * @throws IOException if the stream cannot be decoded */ public PDInlineImage(COSDictionary parameters, byte[] data, PDResources resources) @@ -74,8 +76,7 @@ public final class PDInlineImage implements PDImage if (filters == null || filters.isEmpty()) { this.decodedData = data; - } - else + } else { ByteArrayInputStream in = new ByteArrayInputStream(data); ByteArrayOutputStream out = new ByteArrayOutputStream(data.length); @@ -109,8 +110,7 @@ public final class PDInlineImage implements PDImage if (isStencil()) { return 1; - } - else + } else { return parameters.getInt(COSName.BPC, COSName.BITS_PER_COMPONENT, -1); } @@ -129,19 +129,17 @@ public final class PDInlineImage implements PDImage if (cs != null) { return createColorSpace(cs); - } - else if (isStencil()) + } else if (isStencil()) { // stencil mask color space must be gray, it is often missing return PDDeviceGray.INSTANCE; - } - else + } else { // an image without a color space is always broken throw new IOException("could not determine inline image color space"); } } - + // deliver the long name of a device colorspace, or the parameter private COSBase toLongName(COSBase cs) { @@ -159,7 +157,7 @@ public final class PDInlineImage implements PDImage } return cs; } - + private PDColorSpace createColorSpace(COSBase cs) throws IOException { if (cs instanceof COSName) @@ -247,8 +245,7 @@ public final class PDInlineImage implements PDImage { COSName name = (COSName) filters; names = new COSArrayList<>(name.getName(), name, parameters, COSName.FILTER); - } - else if (filters instanceof COSArray) + } else if (filters instanceof COSArray) { names = COSArrayList.convertCOSNameCOSArrayToList((COSArray) filters); } @@ -296,6 +293,12 @@ public final class PDInlineImage implements PDImage return new ByteArrayInputStream(decodedData); } + @Override + public InputStream createInputStream(DecodeOptions options) throws IOException + { + return createInputStream(); + } + @Override public InputStream createInputStream(List<String> stopFilters) throws IOException { @@ -309,8 +312,7 @@ public final class PDInlineImage implements PDImage if (stopFilters.contains(filters.get(i))) { break; - } - else + } else { Filter filter = FilterFactory.INSTANCE.getFilter(filters.get(i)); filter.decode(in, out, parameters, i); @@ -333,13 +335,19 @@ public final class PDInlineImage implements PDImage { return decodedData; } - + @Override public BufferedImage getImage() throws IOException { return SampledImageReader.getRGBImage(this, getColorKeyMask()); } + @Override + public BufferedImage getImage(Rectangle region, int subsample) throws IOException + { + return getImage(); + } + @Override public BufferedImage getStencilImage(Paint paint) throws IOException { diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/SampledImageReader.java b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/SampledImageReader.java index 0f60b8819..3fefd095b 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/SampledImageReader.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/SampledImageReader.java @@ -16,9 +16,7 @@ */ package org.apache.pdfbox.pdmodel.graphics.image; -import java.awt.Graphics2D; -import java.awt.Paint; -import java.awt.Point; +import java.awt.*; import java.awt.image.BufferedImage; import java.awt.image.DataBuffer; import java.awt.image.DataBufferByte; @@ -29,31 +27,35 @@ import java.io.InputStream; import java.util.Arrays; import javax.imageio.stream.ImageInputStream; import javax.imageio.stream.MemoryCacheImageInputStream; + import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.pdfbox.cos.COSArray; import org.apache.pdfbox.cos.COSNumber; +import org.apache.pdfbox.filter.DecodeOptions; import org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace; import org.apache.pdfbox.pdmodel.graphics.color.PDDeviceGray; import org.apache.pdfbox.pdmodel.graphics.color.PDIndexed; /** * Reads a sampled image from a PDF file. + * * @author John Hewson */ final class SampledImageReader { private static final Log LOG = LogFactory.getLog(SampledImageReader.class); - + private SampledImageReader() { } /** * Returns an ARGB image filled with the given paint and using the given image as a mask. + * * @param paint the paint to fill the visible portions of the image with * @return a masked image filled with the given paint - * @throws IOException if the image cannot be read + * @throws IOException if the image cannot be read * @throws IllegalStateException if the image is not a stencil. */ public static BufferedImage getStencilImage(PDImage pdImage, Paint paint) throws IOException @@ -122,7 +124,7 @@ final class SampledImageReader LOG.warn("premature EOF, image will be incomplete"); break; } - } + } } return masked; @@ -132,23 +134,46 @@ final class SampledImageReader * Returns the content of the given image as an AWT buffered image with an RGB color space. * If a color key mask is provided then an ARGB image is returned instead. * This method never returns null. - * @param pdImage the image to read + * + * @param pdImage the image to read * @param colorKey an optional color key mask * @return content of this image as an RGB buffered image * @throws IOException if the image cannot be read */ public static BufferedImage getRGBImage(PDImage pdImage, COSArray colorKey) throws IOException + { + return getRGBImage(pdImage, null, 1, colorKey); + } + + private static Rectangle clipRegion(PDImage pdImage, Rectangle region) + { + if (region == null) + { + return new Rectangle(0, 0, pdImage.getWidth(), pdImage.getHeight()); + } else + { + int x = Math.max(0, region.x); + int y = Math.max(0, region.y); + int width = Math.min(region.width, pdImage.getWidth() - x); + int height = Math.min(region.height, pdImage.getHeight() - y); + return new Rectangle(x, y, width, height); + } + } + + public static BufferedImage getRGBImage(PDImage pdImage, Rectangle region, int subsample, + COSArray colorKey) throws IOException { if (pdImage.isEmpty()) { throw new IOException("Image stream is empty"); } + Rectangle clipped = clipRegion(pdImage, region); // get parameters, they must be valid or have been repaired final PDColorSpace colorSpace = pdImage.getColorSpace(); final int numComponents = colorSpace.getNumberOfComponents(); - final int width = pdImage.getWidth(); - final int height = pdImage.getHeight(); + final int width = (int) Math.round(clipped.getWidth() / subsample); + final int height = (int) Math.round(clipped.getHeight() / subsample); final int bitsPerComponent = pdImage.getBitsPerComponent(); final float[] decode = getDecodeArray(pdImage); @@ -159,7 +184,7 @@ final class SampledImageReader if (bitsPerComponent == 1 && colorKey == null && numComponents == 1) { - return from1Bit(pdImage); + return from1Bit(pdImage, clipped, subsample, width, height); } // @@ -168,47 +193,65 @@ final class SampledImageReader // in depth to 8bpc as they will be drawn to TYPE_INT_RGB images anyway. All code // in PDColorSpace#toRGBImage expects an 8-bit range, i.e. 0-255. // - WritableRaster raster = Raster.createBandedRaster(DataBuffer.TYPE_BYTE, width, height, - numComponents, new Point(0, 0)); final float[] defaultDecode = pdImage.getColorSpace().getDefaultDecode(8); if (bitsPerComponent == 8 && Arrays.equals(decode, defaultDecode) && colorKey == null) { // convert image, faster path for non-decoded, non-colormasked 8-bit images - return from8bit(pdImage, raster); + return from8bit(pdImage, clipped, subsample, width, height); } - return fromAny(pdImage, raster, colorKey); + return fromAny(pdImage, colorKey, clipped, subsample, width, height); } - private static BufferedImage from1Bit(PDImage pdImage) throws IOException + private static BufferedImage from1Bit(PDImage pdImage, Rectangle clipped, int subsample, + final int width, final int height) throws IOException { final PDColorSpace colorSpace = pdImage.getColorSpace(); - final int width = pdImage.getWidth(); - final int height = pdImage.getHeight(); final float[] decode = getDecodeArray(pdImage); BufferedImage bim = null; WritableRaster raster; byte[] output; - if (colorSpace instanceof PDDeviceGray) - { - // TYPE_BYTE_GRAY and not TYPE_BYTE_BINARY because this one is handled - // without conversion to RGB by Graphics.drawImage - // this reduces the memory footprint, only one byte per pixel instead of three. - bim = new BufferedImage(width, height, BufferedImage.TYPE_BYTE_GRAY); - raster = bim.getRaster(); - } - else - { - raster = Raster.createBandedRaster(DataBuffer.TYPE_BYTE, width, height, 1, new Point(0, 0)); - } - output = ((DataBufferByte) raster.getDataBuffer()).getData(); // read bit stream - try (InputStream iis = pdImage.createInputStream()) + DecodeOptions options = new DecodeOptions(subsample); + options.setSourceRegion(clipped); + try (InputStream iis = pdImage.createInputStream(options)) { + final int inputWidth, inputHeight, startx, starty, scanWidth, scanHeight; + if (options.isHonored()) + { + inputWidth = width; + inputHeight = height; + startx = 0; + starty = 0; + scanWidth = width; + scanHeight = height; + subsample = 1; + } else + { + inputWidth = pdImage.getWidth(); + inputHeight = pdImage.getHeight(); + startx = clipped.x; + starty = clipped.y; + scanWidth = clipped.width; + scanHeight = clipped.height; + } + if (colorSpace instanceof PDDeviceGray) + { + // TYPE_BYTE_GRAY and not TYPE_BYTE_BINARY because this one is handled + // without conversion to RGB by Graphics.drawImage + // this reduces the memory footprint, only one byte per pixel instead of three. + bim = new BufferedImage(width, height, BufferedImage.TYPE_BYTE_GRAY); + raster = bim.getRaster(); + } else + { + raster = Raster.createBandedRaster(DataBuffer.TYPE_BYTE, width, height, 1, new + Point(0, 0)); + } + output = ((DataBufferByte) raster.getDataBuffer()).getData(); final boolean isIndexed = colorSpace instanceof PDIndexed; - int rowLen = width / 8; - if (width % 8 > 0) + int rowLen = inputWidth / 8; + if (inputWidth % 8 > 0) { rowLen++; } @@ -220,18 +263,21 @@ final class SampledImageReader { value0 = 0; value1 = (byte) 255; - } - else + } else { value0 = (byte) 255; value1 = 0; } byte[] buff = new byte[rowLen]; int idx = 0; - for (int y = 0; y < height; y++) + for (int y = 0; y < starty + scanHeight; y++) { int x = 0; int readLen = iis.read(buff); + if (y < starty || y % subsample > 0) + { + continue; + } for (int r = 0; r < rowLen && r < readLen; r++) { int value = buff[r]; @@ -240,9 +286,14 @@ final class SampledImageReader { int bit = value & mask; mask >>= 1; + if (x < startx || x % subsample > 0) + { + x++; + continue; + } output[idx++] = bit == 0 ? value0 : value1; x++; - if (x == width) + if (x >= startx + scanWidth) { break; } @@ -266,31 +317,58 @@ final class SampledImageReader } // faster, 8-bit non-decoded, non-colormasked image conversion - private static BufferedImage from8bit(PDImage pdImage, WritableRaster raster) - throws IOException + private static BufferedImage from8bit(PDImage pdImage, Rectangle clipped, int subsample, + final int width, final int height) throws IOException { - try (InputStream input = pdImage.createInputStream()) + DecodeOptions options = new DecodeOptions(subsample); + options.setSourceRegion(clipped); + try (InputStream input = pdImage.createInputStream(options)) { + final int inputWidth, inputHeight, startx, starty, scanWidth, scanHeight; + if (options.isHonored()) + { + inputWidth = width; + inputHeight = height; + startx = 0; + starty = 0; + scanWidth = width; + scanHeight = height; + subsample = 1; + } else + { + inputWidth = pdImage.getWidth(); + inputHeight = pdImage.getHeight(); + startx = clipped.x; + starty = clipped.y; + scanWidth = clipped.width; + scanHeight = clipped.height; + } + final int numComponents = pdImage.getColorSpace().getNumberOfComponents(); + WritableRaster raster = Raster.createBandedRaster(DataBuffer.TYPE_BYTE, width, height, + numComponents, new Point(0, 0)); // get the raster's underlying byte buffer byte[][] banks = ((DataBufferByte) raster.getDataBuffer()).getBankData(); - final int width = pdImage.getWidth(); - final int height = pdImage.getHeight(); - final int numComponents = pdImage.getColorSpace().getNumberOfComponents(); - byte[] tempBytes = new byte[numComponents * width]; + byte[] tempBytes = new byte[numComponents * inputWidth]; // compromise between memory and time usage: // reading the whole image consumes too much memory // reading one pixel at a time makes it slow in our buffering infrastructure int i = 0; - for (int y = 0; y < height; ++y) + for (int y = 0; y < starty + scanHeight; ++y) { long inputResult = input.read(tempBytes); if (Long.compare(inputResult, tempBytes.length) != 0) { - LOG.debug("Tried reading " + tempBytes.length + " bytes but only " + inputResult + " bytes read"); + LOG.debug("Tried reading " + tempBytes.length + " bytes but only " + + inputResult + " bytes read"); + } + // + if (y < starty || y % subsample > 0) + { + continue; } - for (int x = 0; x < width; ++x) + for (int x = startx; x < startx + scanWidth; x += subsample) { for (int c = 0; c < numComponents; c++) { @@ -305,19 +383,42 @@ final class SampledImageReader } // slower, general-purpose image conversion from any image format - private static BufferedImage fromAny(PDImage pdImage, WritableRaster raster, COSArray colorKey) + private static BufferedImage fromAny(PDImage pdImage, COSArray colorKey, Rectangle clipped, + int subsample, final int width, final int height) throws IOException { final PDColorSpace colorSpace = pdImage.getColorSpace(); final int numComponents = colorSpace.getNumberOfComponents(); - final int width = pdImage.getWidth(); - final int height = pdImage.getHeight(); final int bitsPerComponent = pdImage.getBitsPerComponent(); final float[] decode = getDecodeArray(pdImage); + DecodeOptions options = new DecodeOptions(subsample); + options.setSourceRegion(clipped); // read bit stream - try (ImageInputStream iis = new MemoryCacheImageInputStream(pdImage.createInputStream())) + try (ImageInputStream iis = new MemoryCacheImageInputStream(pdImage.createInputStream + (options))) { + final int inputWidth, inputHeight, startx, starty, scanWidth, scanHeight; + if (options.isHonored()) + { + inputWidth = width; + inputHeight = height; + startx = 0; + starty = 0; + scanWidth = width; + scanHeight = height; + subsample = 1; + } else + { + inputWidth = pdImage.getWidth(); + inputHeight = pdImage.getHeight(); + startx = clipped.x; + starty = clipped.y; + scanWidth = clipped.width; + scanHeight = clipped.height; + } + WritableRaster raster = Raster.createBandedRaster(DataBuffer.TYPE_BYTE, width, height, + numComponents, new Point(0, 0)); final float sampleMax = (float) Math.pow(2, bitsPerComponent) - 1f; final boolean isIndexed = colorSpace instanceof PDIndexed; @@ -332,28 +433,28 @@ final class SampledImageReader // calculate row padding int padding = 0; - if (width * numComponents * bitsPerComponent % 8 > 0) + if (inputWidth * numComponents * bitsPerComponent % 8 > 0) { - padding = 8 - (width * numComponents * bitsPerComponent % 8); + padding = 8 - (inputWidth * numComponents * bitsPerComponent % 8); } // read stream byte[] srcColorValues = new byte[numComponents]; byte[] alpha = new byte[1]; - for (int y = 0; y < height; y++) + for (int y = 0; y < starty + scanHeight; y++) { - for (int x = 0; x < width; x++) + for (int x = 0; x < startx + scanWidth; x++) { boolean isMasked = true; for (int c = 0; c < numComponents; c++) { - int value = (int)iis.readBits(bitsPerComponent); + int value = (int) iis.readBits(bitsPerComponent); // color key mask requires values before they are decoded if (colorKeyRanges != null) { isMasked &= value >= colorKeyRanges[c * 2] && - value <= colorKeyRanges[c * 2 + 1]; + value <= colorKeyRanges[c * 2 + 1]; } // decode array @@ -368,23 +469,26 @@ final class SampledImageReader // indexed color spaces get the raw value, because the TYPE_BYTE // below cannot be reversed by the color space without it having // knowledge of the number of bits per component - srcColorValues[c] = (byte)Math.round(output); - } - else + srcColorValues[c] = (byte) Math.round(output); + } else { // interpolate to TYPE_BYTE int outputByte = Math.round(((output - Math.min(dMin, dMax)) / Math.abs(dMax - dMin)) * 255f); - srcColorValues[c] = (byte)outputByte; + srcColorValues[c] = (byte) outputByte; } } - raster.setDataElements(x, y, srcColorValues); + if (x >= startx && y >= starty && x % subsample == 0 && y % subsample == 0) + { + raster.setDataElements((x - startx) / subsample, (y - starty) / subsample, + srcColorValues); + } // set alpha channel in color key mask, if any if (colorKeyMask != null) { - alpha[0] = (byte)(isMasked ? 255 : 0); + alpha[0] = (byte) (isMasked ? 255 : 0); colorKeyMask.getRaster().setDataElements(x, y, alpha); } } @@ -400,8 +504,7 @@ final class SampledImageReader if (colorKeyMask != null) { return applyColorKeyMask(rgbImage, colorKeyMask); - } - else + } else { return rgbImage; } @@ -466,15 +569,14 @@ final class SampledImageReader LOG.warn("decode array " + cosDecode + " not compatible with color space, using the first two entries"); return new float[] - { - decode0, decode1 - }; + { + decode0, decode1 + }; } } LOG.error("decode array " + cosDecode + " not compatible with color space, using default"); - } - else + } else { decode = cosDecode.toFloatArray(); } diff --git a/pdfbox/src/main/java/org/apache/pdfbox/rendering/PageDrawer.java b/pdfbox/src/main/java/org/apache/pdfbox/rendering/PageDrawer.java index 052ea1223..42c0b5a45 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/rendering/PageDrawer.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/rendering/PageDrawer.java @@ -955,7 +955,10 @@ public class PageDrawer extends PDFGraphicsStreamEngine else { // draw the image - drawBufferedImage(pdImage.getImage(), at); + int subsample = (int)Math.floor(pdImage.getWidth()/at.getScaleX()); + if (subsample<1) subsample = 1; + if (subsample>8) subsample = 8; + drawBufferedImage(pdImage.getImage(null, subsample), at); } if (!pdImage.getInterpolate()) diff --git a/pdfbox/src/test/java/org/apache/pdfbox/pdmodel/common/PDStreamTest.java b/pdfbox/src/test/java/org/apache/pdfbox/pdmodel/common/PDStreamTest.java index de0f63ee6..4062c8be3 100644 --- a/pdfbox/src/test/java/org/apache/pdfbox/pdmodel/common/PDStreamTest.java +++ b/pdfbox/src/test/java/org/apache/pdfbox/pdmodel/common/PDStreamTest.java @@ -91,7 +91,7 @@ public class PDStreamTest PDStream pdStream = new PDStream(doc, is, new COSArray()); Assert.assertEquals(0, pdStream.getFilters().size()); - is = pdStream.createInputStream(null); + is = pdStream.createInputStream((List<String>)null); Assert.assertEquals(12, is.read()); Assert.assertEquals(34, is.read()); Assert.assertEquals(56, is.read());
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org