[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475470#comment-16475470 ]
Emmeran Seehuber commented on PDFBOX-4184: ------------------------------------------ Just got an idea in the shower ... {code:java} Benchmark (zipLevel) Mode Cnt Score Error Units LosslessFactoryBenchmark.predictor 3 thrpt 5 168.186 ± 1.884 ops/s LosslessFactoryBenchmark.predictor 6 thrpt 5 109.865 ± 2.022 ops/s LosslessFactoryBenchmark.predictor 9 thrpt 5 20.382 ± 0.432 ops/s LosslessFactoryBenchmark.predictorBig 3 thrpt 5 2.617 ± 0.047 ops/s LosslessFactoryBenchmark.predictorBig 6 thrpt 5 2.211 ± 0.029 ops/s LosslessFactoryBenchmark.predictorBig 9 thrpt 5 1.627 ± 0.039 ops/s LosslessFactoryBenchmark.predictorBigBytes 3 thrpt 5 2.219 ± 0.055 ops/s LosslessFactoryBenchmark.predictorBigBytes 6 thrpt 5 1.880 ± 0.057 ops/s LosslessFactoryBenchmark.predictorBigBytes 9 thrpt 5 1.454 ± 0.025 ops/s LosslessFactoryBenchmark.rgbOnly 3 thrpt 5 247.996 ± 7.758 ops/s LosslessFactoryBenchmark.rgbOnly 6 thrpt 5 128.242 ± 3.246 ops/s LosslessFactoryBenchmark.rgbOnly 9 thrpt 5 14.259 ± 0.339 ops/s LosslessFactoryBenchmark.rgbOnlyBig 3 thrpt 5 8.113 ± 0.290 ops/s LosslessFactoryBenchmark.rgbOnlyBig 6 thrpt 5 3.317 ± 0.059 ops/s LosslessFactoryBenchmark.rgbOnlyBig 9 thrpt 5 1.308 ± 0.025 ops/s LosslessFactoryBenchmark.rgbOnlyBigBytes 3 thrpt 5 3.506 ± 0.066 ops/s LosslessFactoryBenchmark.rgbOnlyBigBytes 6 thrpt 5 2.149 ± 0.070 ops/s LosslessFactoryBenchmark.rgbOnlyBigBytes 9 thrpt 5 1.081 ± 0.019 ops/s {code} Now the predictor is always faster at zip level 9. It is still slower at the other zip levels, but not that much. [^lossless_predictor_based_imageencoding_v4.patch] I would be fine with this, so no api change would be needed. > [PATCH]: Support simple lossless compression of 16 bit RGB images > ----------------------------------------------------------------- > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing > Affects Versions: 2.0.9 > Reporter: Emmeran Seehuber > Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org