Craig,

I haven't even finished reading your message but it is a keeper. Just wanted to 
let you know how much I appreciate you taking the time to go into the depth on 
this subject as you have.

Thanks

Marijan (Mario) Madunic
Publishing Specialist
New Flyer Industries

-----Original Message-----
From: Craig Ringer [mailto:[email protected]] 
Sent: Wednesday, June 23, 2010 8:34 AM
To: [email protected]
Cc: Mario Madunic
Subject: Re: Compression of output question

On 23/06/10 20:45, Mario Madunic wrote:
> Well first off, I do not know much if anything about PDF compression methods, 
> techniques, or technologies.

[snip]

> So my question is what can be done to bring the file size down? Am I 
> misunderstanding PDF compression and what is possible? Just so you know how 
> I'm thinking of compression, when I'm working in Photoshop and exporting for 
> the web and using the percentage slider to see the quality of pixilation I'm 
> getting. Or is PDF compression more like Zipping a file?

PDF compression is ... complicated.

PDF files contain a series of 'objects', which describe pages, page
contents, fonts, images, etc etc. These objects may have 'streams' of
data associated with them, such as JPEG image data, the contents of a
font, sequences of PDF drawing operations in a simplified
PostScript-like text notation, etc.

PDF streams may be filtered in a number of ways to compress or otherwise
process them. Some types of content, like image data, has special
compression options that apply only to that image data type - say JPEG
compression, CITT fax compression, etc. Other streams can only be
compressed with generalized lossless compression algorithms, of which
PDF only currently supports Deflate (ie gzip) compression. This is true
of the PDF content streams that contain the drawing operations, and of
fonts, among other things.

Additionally, for extra space savings the PDF object structure its self
may be compressed into "object streams" (PDF 1.5 or newer only) if the
PDF producer supports them.

As you're probably beginning to see, it's not as simple as "compressing"
the PDF. You need to know what parts of the PDF are taking up the space
before you can make reasonable decisions.

In most cases, it'll turn out that most of the space is taken up by big
raster (ie bitmap) images. In this case your options for shrinking the
PDF are limited to using stronger/lossier image compression, and/or
resampling the images to lower resolutions. This is best done at
production time if at all possible, since resampling and recompressing
after the PDF is originally produced is generally lossy and can result
in lower image quality for a given file size. If you can't set image
resolution limits at PDF production time, you can use a post-processing
tool like Adobe Acrobat (**NOT** adobe reader) to "optimize" the PDF by
resampling and recompressing images, but this can degrade quality
significantly.

Another thing that can make a PDF big is fonts - particularly wide asian
fonts - being embedded in their entirety into the PDF rather than being
subset so that only the glyphs that are actually used get included.

Yet another possible reason for big PDFs is if you have complex vector
graphics that can't be expressed directly in PDF drawing operations, or
graphics that your PDF producer doesn't know how to convert to PDF
drawing operations. In this situation the PDF producer has to "flatten"
the vector artwork to a raster image - which can, depending on the PDF's
required resolution, be quite huge - and include the raster image. This
is particularly common when dealing with vector graphics that use alpha
transparency while producing PDF 1.3 or older, which do not support it.

To find out how much space different parts of the PDF use, you're best
off opening it up with a tool like Adobe Acrobat (**NOT** Adobe Reader,
the crippled view-only version) and using the PDF size analysis tools of
its PDF Optimizer feature. There are other tools that do PDF size
analysis too, though.

If you have vector graphics like SVG included, they'll be reported as
part of the "image" byte count if they've been flattened, and as part of
the "pdf content streams" or "pdf objects" count if they're not
flattened. It can be hard to figure out whether they've been flattened
to raster or not - the best approach is really to either zoom in on them
to a crazy level (1600 times or more) on the PDF and see if they're
pixellated, or to use the object editor in a tool like Acrobat to see if
they're a single image or a bunch of individual lines/arcs/areas. If you
have access to more advanced PDF tools like Enfocus PitStop (which is
buggy, but amazingly useful) you can find out more.

I don't know much about fop's PDF output capabilities, so I can't tell
you much about what it's doing with its input and how it produces the
output. In particular, I haven't the foggiest if it can render SVG
directly to PDF or if it's flattening it, and if it does try to render
it to PDF what cases it has to fall back to flattening for.

--
Craig Ringer

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


--------------------------------------------------------------------
Please consider the environment before printing this e-mail.

CONFIDENTIALITY STATEMENT: This communication (and  any and all information or 
material transmitted with this communication) is confidential, may be 
privileged and is intended only for the use of the intended recipient. If you 
are not the intended recipient, any review, retransmission, circulation, 
distribution, reproduction, conversion to hard copy, copying or other use of 
this communication, information or material is strictly prohibited and may be 
illegal. If you received this communication in error or if it is forwarded to 
you without the express authorization of New Flyer, please notify us 
immediately by telephone or by return email and permanently delete the 
communication, information and material from any computer, disk drive, diskette 
or other storage device or media. Thank you.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to