[ 
https://issues.apache.org/jira/browse/SOLR-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943962#comment-15943962
 ] 

Cassandra Targett commented on SOLR-10298:
------------------------------------------

While I was out last week, Andrzej did some more analysis, and found a few 
things that are actually rather wrong with the new PDF:

* all the text is represented in hex codes, which adds more bytes. He thought 
this might be related to using custom fonts.
* each piece of text is output in tiny chunks, which he thought might be a 
side-effect of full justification.
* the text color was being set to black, then the dark gray I'd chosen for the 
text, then back to black. He thought maybe using full black would help this.

However, removing all of those customizations did not yield a smaller size. The 
asciidoctor-pdf project includes a basic theme that only uses the included 
fonts (helvetica, times, courier, etc.), is left-justified, and defaults to 
black. Using this theme produces a 28Mb PDF, pretty much the same as with the 
customizations I'd made. So, I don't think we can tweak our way out of this, 
and it's simply a bug/not-yet-implemented-feature of asciidoctor-pdf.

bq. I looked at pdfbox's examples, and none of them seem to be aimed at the 
sort of cleanup we're after, so although it might work for us, I don't see a 
quick win there.

Doing a google search for "pdfbox flate" I came across these javadocs: 
https://pdfbox.apache.org/docs/2.0.1/javadocs/org/apache/pdfbox/pdmodel/common/PDStream.html.
 It seems you can use PDFBox to compress an output stream to FLATE_DECODE (the 
method addCompression appears to be deprecated in PDFBox 2.0, but the 
deprecation note explains how to make it work in a new way). Perhaps that helps?

> Reduce size of new Ref Guide PDF
> --------------------------------
>
>                 Key: SOLR-10298
>                 URL: https://issues.apache.org/jira/browse/SOLR-10298
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: documentation
>            Reporter: Cassandra Targett
>
> The new Ref Guide PDF is ~31Mb in size, which is more than 2x the current PDF 
> produced by Confluence (which is 14Mb).
> The asciidoctor-pdf project has a script to optimize the PDF, mostly by 
> scaling down images. When I run this tool on the new PDF, the size is reduced 
> to ~18Mb. (More info on this script: 
> https://github.com/asciidoctor/asciidoctor-pdf#optional-scripts).
> Some of the current image files are very large in size, so I believe that by 
> scaling the images down, we can make the size smaller without adding a step 
> in the build to run the optimize script programmatically (it also has a 
> dependency on GhostScript, so it would be nice to not add another dependency 
> if it can be avoided).
> The new PDF is also about 300 pages longer, but this issue is primarily 
> concerned with file size. However, reducing the number of pages will also 
> make it smaller. A few things that could be tried to reduce the # of pages:
> * Reduce font sizes
> * Increase page margins
> * Review options for when a forced page-break is used and modify if possible



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to