[ 
https://issues.apache.org/jira/browse/PDFBOX-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18000676#comment-18000676
 ] 

Zer Jun Eng commented on PDFBOX-6030:
-------------------------------------

> How about useing 
> "org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.createFromByteArray(PDDocument,
>  byte[])" as workaround? The already encoded image is passed as byte array, 
> so that one might use any suitable process to encode such an image.

We also evaluated the `JPEGFactory.createFromByteArray(PDDocument, byte[])` 
method. We still find that `JPEGFactory.createFromImage(PDDocument, 
BufferedImage, float, int)` the most convenience because it handles alpha 
channel nicely in the private `createJPEG` method.

https://github.com/apache/pdfbox/blob/3.0.5/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/JPEGFactory.java#L305-L329

> JPEGFactory: createImage and setOptimizeHuffmanTables
> -----------------------------------------------------
>
>                 Key: PDFBOX-6030
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6030
>             Project: PDFBox
>          Issue Type: Wish
>    Affects Versions: 2.0.34, 3.0.5 PDFBox
>            Reporter: Zer Jun Eng
>            Priority: Minor
>              Labels: JPEG, JPG, jpeg
>             Fix For: 2.0.35, 3.0.6 PDFBox, 4.0.0
>
>         Attachments: PDFBOX-6030.diff, zoo-711050_1920.jpg
>
>
> Dear PDFBox developers,
> I'm writing to request an enhancement to the JPEGFactory class, specifically 
> concerning the createFromImage(PDDocument document, BufferedImage image, 
> float quality, int dpi) method.
> Currently, when using this method, there isn't a direct way to enable the 
> setOptimizeHuffmanTables option of JPEGImageWriteParam. This optimization can 
> be quite beneficial for reducing file size.
> To work around this, my team currently has to copy the JPEGFactory source 
> code into our project and modify the private encodeImageToJPEGStream method. 
> This approach isn't ideal as it makes maintenance more difficult and prevents 
> us from easily updating to new PDFBox versions.
> Would you consider exposing this setOptimizeHuffmanTables option, perhaps as 
> an additional parameter to the createFromImage method or through a separate 
> setter on JPEGFactory? This would allow users to leverage this optimization 
> without resorting to workarounds.
> Thank you for considering this request.
> —
> Replying to the email thread: 
> https://lists.apache.org/thread/pgo0m1r8vgxd12zl3499fv38s163mpm3
> I wrote a minimal benchmark code that compares the difference between the 
> output file size and execution time with and without setOptimizeHuffmanTables:
> {code:java}
> import java.awt.image.BufferedImage;
> import java.io.ByteArrayOutputStream;
> import java.io.File;
> import java.io.IOException;
> import java.time.Duration;
> import java.time.Instant;
> import java.util.Iterator;
> import javax.imageio.IIOImage;
> import javax.imageio.ImageIO;
> import javax.imageio.ImageTypeSpecifier;
> import javax.imageio.ImageWriteParam;
> import javax.imageio.ImageWriter;
> import javax.imageio.metadata.IIOMetadata;
> import javax.imageio.plugins.jpeg.JPEGImageWriteParam;
> import javax.imageio.stream.ImageOutputStream;
> import org.w3c.dom.Element;
> class Huffman {
>   private static ImageWriter getJPEGImageWriter() throws IOException {
>     Iterator<ImageWriter> writers = ImageIO.getImageWritersBySuffix("jpeg");
>     while (writers.hasNext()) {
>       ImageWriter writer = writers.next();
>       if (writer == null) {
>         continue;
>       }
>       // PDFBOX-3566: avoid CLibJPEGImageWriter, which is not a 
> JPEGImageWriteParam
>       if (writer.getDefaultWriteParam() instanceof JPEGImageWriteParam) {
>         return writer;
>       }
>       writer.dispose();
>     }
>     throw new IOException("No ImageWriter found for JPEG format");
>   }
>   public static byte[] encodeImageToJPEGStream(BufferedImage image, float 
> quality, int dpi,
>       boolean optimizeHuffman)
>       throws IOException {
>     ImageWriter imageWriter = getJPEGImageWriter(); // find JAI writer
>     ByteArrayOutputStream baos = new ByteArrayOutputStream();
>     try (ImageOutputStream ios = ImageIO.createImageOutputStream(baos)) {
>       imageWriter.setOutput(ios);
>       // add compression
>       JPEGImageWriteParam jpegParam = (JPEGImageWriteParam) 
> imageWriter.getDefaultWriteParam();
>       jpegParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
>       jpegParam.setCompressionQuality(quality);
>       jpegParam.setOptimizeHuffmanTables(optimizeHuffman);
>       // add metadata
>       ImageTypeSpecifier imageTypeSpecifier = new ImageTypeSpecifier(image);
>       IIOMetadata data = 
> imageWriter.getDefaultImageMetadata(imageTypeSpecifier, jpegParam);
>       Element tree = (Element) data.getAsTree("javax_imageio_jpeg_image_1.0");
>       Element jfif = (Element) tree.getElementsByTagName("app0JFIF").item(0);
>       String dpiString = Integer.toString(dpi);
>       jfif.setAttribute("Xdensity", dpiString);
>       jfif.setAttribute("Ydensity", dpiString);
>       jfif.setAttribute("resUnits", "1"); // 1 = dots/inch
>       // write
>       imageWriter.write(data, new IIOImage(image, null, null), jpegParam);
>       return baos.toByteArray();
>     } finally {
>       imageWriter.dispose();
>     }
>   }
>   public static long benchmark(BufferedImage img, boolean optimizeHuffman) 
> throws IOException {
>     final float quality = 0.75f;
>     final int dpi = 72;
>     Instant i1 = Instant.now();
>     int length = encodeImageToJPEGStream(img, quality, dpi, 
> optimizeHuffman).length;
>     Instant i2 = Instant.now();
>     long executionTime = Duration.between(i1, i2).toMillis();
>     System.out.printf("optimize Huffman = %b: %d bytes, execution time %d 
> ms%n",
>         optimizeHuffman, length, executionTime);
>     return executionTime;
>   }
>   public static void main(String[] args) throws IOException {
>     final int runs = 100;
>     long totalOptimizedExecutionTime = 0L;
>     long totalUnoptimizedExecutionTime = 0L;
>     BufferedImage img = ImageIO.read(new File("zoo-711050_1920.jpg"));
>     for (int i = 0; i < runs; i++) {
>       totalOptimizedExecutionTime += benchmark(img, true);
>       totalUnoptimizedExecutionTime += benchmark(img, false);
>     }
>     
>     float avgOptimizedExecutionTime = (float) totalOptimizedExecutionTime / 
> runs;
>     float avgUnoptimizedExecutionTime = (float) totalUnoptimizedExecutionTime 
> / runs;
>     System.out.printf("Average optimized execution time: %f ms%n", 
> avgOptimizedExecutionTime);
>     System.out.printf("Average unoptimized execution time: %f ms%n", 
> avgUnoptimizedExecutionTime);
>   }
> }
> {code}
> {code:sh}
> ...
> optimize Huffman = true: 580768 bytes, execution time 192 ms
> optimize Huffman = false: 589050 bytes, execution time 167 ms
> Average optimized execution time: 192.729996 ms
> Average unoptimized execution time: 167.929993 ms
> {code}
> I used an image I randomly picked from https://pixabay.com/ (attached below). 
> The results show that enabling setOptimizeHuffmanTables produces a slightly 
> smaller file size but takes longer to execute.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to