[ https://issues.apache.org/jira/browse/PDFBOX-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18000676#comment-18000676 ]
Zer Jun Eng commented on PDFBOX-6030: ------------------------------------- > How about useing > "org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.createFromByteArray(PDDocument, > byte[])" as workaround? The already encoded image is passed as byte array, > so that one might use any suitable process to encode such an image. We also evaluated the `JPEGFactory.createFromByteArray(PDDocument, byte[])` method. We still find that `JPEGFactory.createFromImage(PDDocument, BufferedImage, float, int)` the most convenience because it handles alpha channel nicely in the private `createJPEG` method. https://github.com/apache/pdfbox/blob/3.0.5/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/JPEGFactory.java#L305-L329 > JPEGFactory: createImage and setOptimizeHuffmanTables > ----------------------------------------------------- > > Key: PDFBOX-6030 > URL: https://issues.apache.org/jira/browse/PDFBOX-6030 > Project: PDFBox > Issue Type: Wish > Affects Versions: 2.0.34, 3.0.5 PDFBox > Reporter: Zer Jun Eng > Priority: Minor > Labels: JPEG, JPG, jpeg > Fix For: 2.0.35, 3.0.6 PDFBox, 4.0.0 > > Attachments: PDFBOX-6030.diff, zoo-711050_1920.jpg > > > Dear PDFBox developers, > I'm writing to request an enhancement to the JPEGFactory class, specifically > concerning the createFromImage(PDDocument document, BufferedImage image, > float quality, int dpi) method. > Currently, when using this method, there isn't a direct way to enable the > setOptimizeHuffmanTables option of JPEGImageWriteParam. This optimization can > be quite beneficial for reducing file size. > To work around this, my team currently has to copy the JPEGFactory source > code into our project and modify the private encodeImageToJPEGStream method. > This approach isn't ideal as it makes maintenance more difficult and prevents > us from easily updating to new PDFBox versions. > Would you consider exposing this setOptimizeHuffmanTables option, perhaps as > an additional parameter to the createFromImage method or through a separate > setter on JPEGFactory? This would allow users to leverage this optimization > without resorting to workarounds. > Thank you for considering this request. > — > Replying to the email thread: > https://lists.apache.org/thread/pgo0m1r8vgxd12zl3499fv38s163mpm3 > I wrote a minimal benchmark code that compares the difference between the > output file size and execution time with and without setOptimizeHuffmanTables: > {code:java} > import java.awt.image.BufferedImage; > import java.io.ByteArrayOutputStream; > import java.io.File; > import java.io.IOException; > import java.time.Duration; > import java.time.Instant; > import java.util.Iterator; > import javax.imageio.IIOImage; > import javax.imageio.ImageIO; > import javax.imageio.ImageTypeSpecifier; > import javax.imageio.ImageWriteParam; > import javax.imageio.ImageWriter; > import javax.imageio.metadata.IIOMetadata; > import javax.imageio.plugins.jpeg.JPEGImageWriteParam; > import javax.imageio.stream.ImageOutputStream; > import org.w3c.dom.Element; > class Huffman { > private static ImageWriter getJPEGImageWriter() throws IOException { > Iterator<ImageWriter> writers = ImageIO.getImageWritersBySuffix("jpeg"); > while (writers.hasNext()) { > ImageWriter writer = writers.next(); > if (writer == null) { > continue; > } > // PDFBOX-3566: avoid CLibJPEGImageWriter, which is not a > JPEGImageWriteParam > if (writer.getDefaultWriteParam() instanceof JPEGImageWriteParam) { > return writer; > } > writer.dispose(); > } > throw new IOException("No ImageWriter found for JPEG format"); > } > public static byte[] encodeImageToJPEGStream(BufferedImage image, float > quality, int dpi, > boolean optimizeHuffman) > throws IOException { > ImageWriter imageWriter = getJPEGImageWriter(); // find JAI writer > ByteArrayOutputStream baos = new ByteArrayOutputStream(); > try (ImageOutputStream ios = ImageIO.createImageOutputStream(baos)) { > imageWriter.setOutput(ios); > // add compression > JPEGImageWriteParam jpegParam = (JPEGImageWriteParam) > imageWriter.getDefaultWriteParam(); > jpegParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT); > jpegParam.setCompressionQuality(quality); > jpegParam.setOptimizeHuffmanTables(optimizeHuffman); > // add metadata > ImageTypeSpecifier imageTypeSpecifier = new ImageTypeSpecifier(image); > IIOMetadata data = > imageWriter.getDefaultImageMetadata(imageTypeSpecifier, jpegParam); > Element tree = (Element) data.getAsTree("javax_imageio_jpeg_image_1.0"); > Element jfif = (Element) tree.getElementsByTagName("app0JFIF").item(0); > String dpiString = Integer.toString(dpi); > jfif.setAttribute("Xdensity", dpiString); > jfif.setAttribute("Ydensity", dpiString); > jfif.setAttribute("resUnits", "1"); // 1 = dots/inch > // write > imageWriter.write(data, new IIOImage(image, null, null), jpegParam); > return baos.toByteArray(); > } finally { > imageWriter.dispose(); > } > } > public static long benchmark(BufferedImage img, boolean optimizeHuffman) > throws IOException { > final float quality = 0.75f; > final int dpi = 72; > Instant i1 = Instant.now(); > int length = encodeImageToJPEGStream(img, quality, dpi, > optimizeHuffman).length; > Instant i2 = Instant.now(); > long executionTime = Duration.between(i1, i2).toMillis(); > System.out.printf("optimize Huffman = %b: %d bytes, execution time %d > ms%n", > optimizeHuffman, length, executionTime); > return executionTime; > } > public static void main(String[] args) throws IOException { > final int runs = 100; > long totalOptimizedExecutionTime = 0L; > long totalUnoptimizedExecutionTime = 0L; > BufferedImage img = ImageIO.read(new File("zoo-711050_1920.jpg")); > for (int i = 0; i < runs; i++) { > totalOptimizedExecutionTime += benchmark(img, true); > totalUnoptimizedExecutionTime += benchmark(img, false); > } > > float avgOptimizedExecutionTime = (float) totalOptimizedExecutionTime / > runs; > float avgUnoptimizedExecutionTime = (float) totalUnoptimizedExecutionTime > / runs; > System.out.printf("Average optimized execution time: %f ms%n", > avgOptimizedExecutionTime); > System.out.printf("Average unoptimized execution time: %f ms%n", > avgUnoptimizedExecutionTime); > } > } > {code} > {code:sh} > ... > optimize Huffman = true: 580768 bytes, execution time 192 ms > optimize Huffman = false: 589050 bytes, execution time 167 ms > Average optimized execution time: 192.729996 ms > Average unoptimized execution time: 167.929993 ms > {code} > I used an image I randomly picked from https://pixabay.com/ (attached below). > The results show that enabling setOptimizeHuffmanTables produces a slightly > smaller file size but takes longer to execute. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org