[
https://issues.apache.org/jira/browse/SOLR-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935230#comment-15935230
]
Cassandra Targett commented on SOLR-10298:
------------------------------------------
With [~ab]'s help, I figured out a couple of explanations for the size.
First, I reduced the resolution on some of the largest PNG files, and that
reduced the size of the PDF by about 4Mb (to 27Mb). Using the optimize-pdf
script [1] from the asciidoctor-pdf project (the source for the PDF conversion
we're using here [2]), the PDF can be compressed to about 12Mb, which is
smaller than the current PDF (which is 14Mb).
I believe we will need to use that script, or something that does the same
thing, in our build process because Andrzej helped me figure out that the font
dictionary is used on nearly every page, which tells us the PDF is not
compressed. For further proof of lack of compression, I gzipped the file and it
was ~70% smaller as a .gz file. After running it through the optimize-pdf
script, we can see {{/Filter /FlateDecode}} added to the file, which tells us
that it is now compressed. The asciidoctor-pdf project uses Prawn, and this is
apparently a known issue - at least in the version that project is using (I
didn't dig that far into that part of it). [3]
The issue with the optimize-pdf script that comes from the asciidoctor-pdf
project is that it has a dependency on Ghostscript. Maybe that is a simple
problem to solve, but we have not yet spent any time trying to figure out if it
can be easily added to the ant target and not require that it be pre-installed
for anyone wanting to create a PDF locally.
It's also entirely conceivable that a similar script could be written that does
the same things, but it would be beyond my abilities at this point.
[1] https://github.com/asciidoctor/asciidoctor-pdf/blob/master/bin/optimize-pdf
[2] https://github.com/asciidoctor/asciidoctor-pdf
[3]
https://github.com/asciidoctor/asciidoctorj/issues/476#issuecomment-246201886 -
last paragraph explains an issue with Prawn that is causing this.
> Reduce size of new Ref Guide PDF
> --------------------------------
>
> Key: SOLR-10298
> URL: https://issues.apache.org/jira/browse/SOLR-10298
> Project: Solr
> Issue Type: Sub-task
> Security Level: Public(Default Security Level. Issues are Public)
> Components: documentation
> Reporter: Cassandra Targett
>
> The new Ref Guide PDF is ~31Mb in size, which is more than 2x the current PDF
> produced by Confluence (which is 14Mb).
> The asciidoctor-pdf project has a script to optimize the PDF, mostly by
> scaling down images. When I run this tool on the new PDF, the size is reduced
> to ~18Mb. (More info on this script:
> https://github.com/asciidoctor/asciidoctor-pdf#optional-scripts).
> Some of the current image files are very large in size, so I believe that by
> scaling the images down, we can make the size smaller without adding a step
> in the build to run the optimize script programmatically (it also has a
> dependency on GhostScript, so it would be nice to not add another dependency
> if it can be avoided).
> The new PDF is also about 300 pages longer, but this issue is primarily
> concerned with file size. However, reducing the number of pages will also
> make it smaller. A few things that could be tried to reduce the # of pages:
> * Reduce font sizes
> * Increase page margins
> * Review options for when a forced page-break is used and modify if possible
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]