Craig, I haven't even finished reading your message but it is a keeper. Just wanted to let you know how much I appreciate you taking the time to go into the depth on this subject as you have.
Thanks Marijan (Mario) Madunic Publishing Specialist New Flyer Industries -----Original Message----- From: Craig Ringer [mailto:[email protected]] Sent: Wednesday, June 23, 2010 8:34 AM To: [email protected] Cc: Mario Madunic Subject: Re: Compression of output question On 23/06/10 20:45, Mario Madunic wrote: > Well first off, I do not know much if anything about PDF compression methods, > techniques, or technologies. [snip] > So my question is what can be done to bring the file size down? Am I > misunderstanding PDF compression and what is possible? Just so you know how > I'm thinking of compression, when I'm working in Photoshop and exporting for > the web and using the percentage slider to see the quality of pixilation I'm > getting. Or is PDF compression more like Zipping a file? PDF compression is ... complicated. PDF files contain a series of 'objects', which describe pages, page contents, fonts, images, etc etc. These objects may have 'streams' of data associated with them, such as JPEG image data, the contents of a font, sequences of PDF drawing operations in a simplified PostScript-like text notation, etc. PDF streams may be filtered in a number of ways to compress or otherwise process them. Some types of content, like image data, has special compression options that apply only to that image data type - say JPEG compression, CITT fax compression, etc. Other streams can only be compressed with generalized lossless compression algorithms, of which PDF only currently supports Deflate (ie gzip) compression. This is true of the PDF content streams that contain the drawing operations, and of fonts, among other things. Additionally, for extra space savings the PDF object structure its self may be compressed into "object streams" (PDF 1.5 or newer only) if the PDF producer supports them. As you're probably beginning to see, it's not as simple as "compressing" the PDF. You need to know what parts of the PDF are taking up the space before you can make reasonable decisions. In most cases, it'll turn out that most of the space is taken up by big raster (ie bitmap) images. In this case your options for shrinking the PDF are limited to using stronger/lossier image compression, and/or resampling the images to lower resolutions. This is best done at production time if at all possible, since resampling and recompressing after the PDF is originally produced is generally lossy and can result in lower image quality for a given file size. If you can't set image resolution limits at PDF production time, you can use a post-processing tool like Adobe Acrobat (**NOT** adobe reader) to "optimize" the PDF by resampling and recompressing images, but this can degrade quality significantly. Another thing that can make a PDF big is fonts - particularly wide asian fonts - being embedded in their entirety into the PDF rather than being subset so that only the glyphs that are actually used get included. Yet another possible reason for big PDFs is if you have complex vector graphics that can't be expressed directly in PDF drawing operations, or graphics that your PDF producer doesn't know how to convert to PDF drawing operations. In this situation the PDF producer has to "flatten" the vector artwork to a raster image - which can, depending on the PDF's required resolution, be quite huge - and include the raster image. This is particularly common when dealing with vector graphics that use alpha transparency while producing PDF 1.3 or older, which do not support it. To find out how much space different parts of the PDF use, you're best off opening it up with a tool like Adobe Acrobat (**NOT** Adobe Reader, the crippled view-only version) and using the PDF size analysis tools of its PDF Optimizer feature. There are other tools that do PDF size analysis too, though. If you have vector graphics like SVG included, they'll be reported as part of the "image" byte count if they've been flattened, and as part of the "pdf content streams" or "pdf objects" count if they're not flattened. It can be hard to figure out whether they've been flattened to raster or not - the best approach is really to either zoom in on them to a crazy level (1600 times or more) on the PDF and see if they're pixellated, or to use the object editor in a tool like Acrobat to see if they're a single image or a bunch of individual lines/arcs/areas. If you have access to more advanced PDF tools like Enfocus PitStop (which is buggy, but amazingly useful) you can find out more. I don't know much about fop's PDF output capabilities, so I can't tell you much about what it's doing with its input and how it produces the output. In particular, I haven't the foggiest if it can render SVG directly to PDF or if it's flattening it, and if it does try to render it to PDF what cases it has to fall back to flattening for. -- Craig Ringer --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] -------------------------------------------------------------------- Please consider the environment before printing this e-mail. CONFIDENTIALITY STATEMENT: This communication (and any and all information or material transmitted with this communication) is confidential, may be privileged and is intended only for the use of the intended recipient. If you are not the intended recipient, any review, retransmission, circulation, distribution, reproduction, conversion to hard copy, copying or other use of this communication, information or material is strictly prohibited and may be illegal. If you received this communication in error or if it is forwarded to you without the express authorization of New Flyer, please notify us immediately by telephone or by return email and permanently delete the communication, information and material from any computer, disk drive, diskette or other storage device or media. Thank you. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
