[
https://issues.apache.org/jira/browse/SOLR-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943962#comment-15943962
]
Cassandra Targett commented on SOLR-10298:
------------------------------------------
While I was out last week, Andrzej did some more analysis, and found a few
things that are actually rather wrong with the new PDF:
* all the text is represented in hex codes, which adds more bytes. He thought
this might be related to using custom fonts.
* each piece of text is output in tiny chunks, which he thought might be a
side-effect of full justification.
* the text color was being set to black, then the dark gray I'd chosen for the
text, then back to black. He thought maybe using full black would help this.
However, removing all of those customizations did not yield a smaller size. The
asciidoctor-pdf project includes a basic theme that only uses the included
fonts (helvetica, times, courier, etc.), is left-justified, and defaults to
black. Using this theme produces a 28Mb PDF, pretty much the same as with the
customizations I'd made. So, I don't think we can tweak our way out of this,
and it's simply a bug/not-yet-implemented-feature of asciidoctor-pdf.
bq. I looked at pdfbox's examples, and none of them seem to be aimed at the
sort of cleanup we're after, so although it might work for us, I don't see a
quick win there.
Doing a google search for "pdfbox flate" I came across these javadocs:
https://pdfbox.apache.org/docs/2.0.1/javadocs/org/apache/pdfbox/pdmodel/common/PDStream.html.
It seems you can use PDFBox to compress an output stream to FLATE_DECODE (the
method addCompression appears to be deprecated in PDFBox 2.0, but the
deprecation note explains how to make it work in a new way). Perhaps that helps?
> Reduce size of new Ref Guide PDF
> --------------------------------
>
> Key: SOLR-10298
> URL: https://issues.apache.org/jira/browse/SOLR-10298
> Project: Solr
> Issue Type: Sub-task
> Security Level: Public(Default Security Level. Issues are Public)
> Components: documentation
> Reporter: Cassandra Targett
>
> The new Ref Guide PDF is ~31Mb in size, which is more than 2x the current PDF
> produced by Confluence (which is 14Mb).
> The asciidoctor-pdf project has a script to optimize the PDF, mostly by
> scaling down images. When I run this tool on the new PDF, the size is reduced
> to ~18Mb. (More info on this script:
> https://github.com/asciidoctor/asciidoctor-pdf#optional-scripts).
> Some of the current image files are very large in size, so I believe that by
> scaling the images down, we can make the size smaller without adding a step
> in the build to run the optimize script programmatically (it also has a
> dependency on GhostScript, so it would be nice to not add another dependency
> if it can be avoided).
> The new PDF is also about 300 pages longer, but this issue is primarily
> concerned with file size. However, reducing the number of pages will also
> make it smaller. A few things that could be tried to reduce the # of pages:
> * Reduce font sizes
> * Increase page margins
> * Review options for when a forced page-break is used and modify if possible
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]