For the last few months I have been working with Kevin McArthur on a comprehensive PDF generation project for a client [Streamflow] who has some pretty advanced layout needs. The project is nearing completion and we have been discussing the possibility of contributing large portions of the code back to the Zend Framework as improvements to Zend_Pdf.

In light of several recent postings to fw-general and fw-formats, as well as a few encouraging proposals recently submitted to the Wiki, we would like to formally announce our plans and describe the new functionality at a high level here.

We will be submitting proposals in the coming weeks that describe these new components in more detail along with fully-functional reference implementations. Our hope is to join forces with other interested developers to help fast-track these proposals through the feedback and approval process, write tests, user documentation, and examples, and exercise the code as much as possible.

We're really proud of this work and are excited to share it with the community. We believe that these enhancements will further establish Zend_Pdf's role as the gold standard for PDF generation using PHP.


Text Layout Engine
------------------

"How do I wrap long lines of text?" This is probably the most commonly- asked question regarding Zend_Pdf. I'm pleased to report that not only have we solved the problem of text-wrapping, but a whole host of others as well. The new engine provides fully-automatic text layout, and has customization hooks in a variety of places.

Line breaks are calculated using the Unicode Line Breaking Algorithm (UAX #14), providing linguistically-appropriate line breaks, not just at whitespace characters.

Paragraph styles allow you to specify left-, center-, and right- alignment, as well as full justification, line leading, line height, line multiple (double-space, triple-space, etc.), pre- and post- paragraph spacing, left- and right-side margins, and first-line indentation. Paragraph styles also support left-, center-, right-, and decimal-aligned tab stops, with or without leaders, for intra-line alignment needs.

In addition to the left-to-right line sweep used by most Latin-based scripts, right-to-left line sweep is also supported, and is automatically detected by the layout engine; you never need to supply strings in reverse character order for right-to-left text layout.

The layout engine is based around the concept of an attributed string. These are Unicode strings of unlimited length, and fully support the entire Unicode character set, including characters outside the Basic Multilingual Plane (BMP).

Attributed strings allow you to assign stylistic attributes to arbitrary ranges of characters within the string. These attributes are used by typesetters to determine the specific look and location for every character. This means that you can make unlimited style changes within a block of text, even changing styles character-by-character if desired.

The layout engine automatically manages all of these style changes, applying them as necessary when drawing the text on the page. The following style attributes are supported:

 - Font
 - Font size
 - Fill color
 - Stroke width and color
 - Underline and strikethrough
 - Super- and sub-script
 - Background color

You can add your own custom attributes as well, which you can use in your own subclasses to completely customize the layout engine's behavior.

These attributed strings will eventually be shared with Zend_Rtf (recently proposed by Andries Seutens), as each attributed string is essentially a self-contained RTF document. This opens up the possibility for generating fully-styled PDF or RTF output from the same source with only a couple of lines of code. It will also eventually be possible to use existing styled RTF documents as the basis for PDF text drawing, eliminating the need to manually apply style attributes in your PHP code.

A layout manager class is responsible for drawing these attributed strings. It lays out the text in a series of arbitrarily-shaped text containers, automatically moving from one to the next as each is filled. Rectangular and circular containers will be provided, but you can easily create your own custom containers for other shapes or to flow text around images.

Multi-column output is as easy as creating two adjacent text containers on the same page. Text containers don't even need to be on the same PDF page: you can start your text in a small container on page 1, then continue it on page 17.

Callback functions are provided to allow you to create text additional containers as needed, which can be located on new pages. This is useful if you do not know the length of the text you are drawing ahead of time, or if you want to adapt your layout on-the-fly.

You can also use multiple layout managers on a single page, allowing you to create complex multi-page flows for a series of text runs. These can be useful for creating page headers and footers, or for running stories side-by-side in a newsletter.


Drawing Model
-------------

Three new primitive geometry classes allow you to precisely define drawing locations, sizes, and regions. They also provide a host of convenience functions allowing for calculation, conversion, intersection testing, etc.:

 - Point: x and y coordinate
 - Size: height and width
 - Rectangle: combination of a point and size

PDF pages are drawn using a series of content streams, which contain all of the low-level drawing commands. Zend_Pdf_Page currently manages its own private content stream.

We've separated content streams from Zend_Pdf_Page, promoting them to first-class objects. This allows us to use these content streams as templates that can be reused again and again, either on a single page or multiple pages. Templates can greatly reduce PDF file sizes and improve memory use and performance in PDF viewer applications.

It is also possible to create a template from any page in an existing PDF document. You can then reuse the template in the same PDF, or even copy it to a new PDF document, where you can use it as a page background, draw it as a thumbnail, perform imposition, etc.


Performance and Memory
----------------------

We've also made numerous performance and memory-usage improvements throughout the code. Most data is now lazily-loaded, allowing you to manipulate very large documents, containing thousands or millions of individual objects or hundreds of megabytes or gigabytes in size, with a very low memory footprint.


Future Enhancements
-------------------

All of this new functionality lays the groundwork for even more powerful enhancements down the road:

 - Top-to-bottom line sweep for Asian scripts
 - Bi-directional text (for Hebrew, Arabic, and others)
 - Bulleted and numbered text lists
 - HTML-inspired inline text tables
 - Inline attachments (for example, images that flow with text)
- Advanced typographic features such as tracking, pairwise kerning, ligatures, etc.
 - Hyphenation support
 - Glyph substitution using fallback fonts
 - and more...


Again, we're really excited to be sharing this code with the community. We'll be creating the proposals for the various components in the coming weeks and announcing them on the fw-formats list when they're ready for review. In the meantime, if you have any high-level questions, please don't hesitate to ask.

--

Willie Alberty, Owner
Spenlen Media
[EMAIL PROTECTED]

http://www.spenlen.com/

Reply via email to