[
https://issues.apache.org/jira/browse/SOLR-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951530#comment-15951530
]
Hoss Man commented on SOLR-10298:
---------------------------------
bq. I looked at pdfbox's examples, ...
bq. ...It seems you can use PDFBox to compress an output stream to FLATE_DECODE
...
I don't know why it didn't occur to me before to look into using pdfbox, but
with a little poking around of their examples, and the tip from cassandra about
{{COSName.FLATE_DECODE}} i was able to modify one of the pdfbox examples to use
{{COSName.FLATE_DECODE}} in a new (pure java!) tool that shrinks the ascidoctor
generated PDF from 27M to 9.4M using inteneral PDF stream compression -- even
w/o any image resizing/resampling.
on the jira/solr-10290 branch, i've already hooked this new "ReducePDFSize"
tool into the build process, but you can still compare the "RAW" PDF produces
by asciidoctor with the final output...
{noformat}
hossman@tray:~/lucene/dev/solr [solr-10290] $ du -sh
build/solr-ref-guide/pdf-tmp/*.pdf build/solr-ref-guide/*.pdf
27M build/solr-ref-guide/pdf-tmp/RAW-apache-solr-ref-guide-7.0.pdf
9.4M build/solr-ref-guide/apache-solr-ref-guide-7.0.pdf
{noformat}
...as far as i can tell, the pdfbox code hasn't "broken" anything in the
original PDF -- but more eye balls would be helpful to verify.
There might be more gains to be made in reducing the size, but i'd vote for
calling this a win, moving on, and leaving any questions of further reductions
for future issues.
> Reduce size of new Ref Guide PDF
> --------------------------------
>
> Key: SOLR-10298
> URL: https://issues.apache.org/jira/browse/SOLR-10298
> Project: Solr
> Issue Type: Sub-task
> Security Level: Public(Default Security Level. Issues are Public)
> Components: documentation
> Reporter: Cassandra Targett
>
> The new Ref Guide PDF is ~31Mb in size, which is more than 2x the current PDF
> produced by Confluence (which is 14Mb).
> The asciidoctor-pdf project has a script to optimize the PDF, mostly by
> scaling down images. When I run this tool on the new PDF, the size is reduced
> to ~18Mb. (More info on this script:
> https://github.com/asciidoctor/asciidoctor-pdf#optional-scripts).
> Some of the current image files are very large in size, so I believe that by
> scaling the images down, we can make the size smaller without adding a step
> in the build to run the optimize script programmatically (it also has a
> dependency on GhostScript, so it would be nice to not add another dependency
> if it can be avoided).
> The new PDF is also about 300 pages longer, but this issue is primarily
> concerned with file size. However, reducing the number of pages will also
> make it smaller. A few things that could be tried to reduce the # of pages:
> * Reduce font sizes
> * Increase page margins
> * Review options for when a forced page-break is used and modify if possible
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]