On Nov 10, 2007, at 9:29 PM, Willie Alberty wrote:

On Nov 10, 2007, at 3:47 PM, Rick Gigger wrote:

A few questions:

1. Are you going to be using xsl-fo for your layout model?

No. XSL-FO does not provide enough fidelity for my needs. The text model I am working on is based on the OpenStep (Cocoa) frameworks, and has an attributed string class at its core. Attributes such as font, color, size, kerning, line height, tab stops, etc. may be applied to arbitrary ranges of characters within such strings.

...

In the end, Zend_Pdf will have three useful layers for document layout: an FO interpreter for multi-page document flows, the lower- level layout classes for drawing attributed strings with finer- grained control, and the existing drawing primitives already in Zend_Pdf for precise glyph-by-glyph placement.

I wish I had time to just wait for it to all be completed. :) I have implemented something that is not nearly as robust but that more or less handles all of my current needs and is working now. I will be interested to see what you think as I clean up the code and release for public use / comments.


2. Does your revised font code involve support for unicode and/or non Latin-1 fonts?

The font code in Zend_Pdf can already handle Unicode character sets. All fonts used by the framework, whether the standard 14 PDF fonts, or custom OpenType fonts, extract and use a Unicode character-to- glyph map for drawing.

The reason that Zend_Pdf cannot currently draw non-Latin-1 strings lies in the text encoding method used when drawing the glyphs on the page. Right now, we've hard-coded using the WinAnsiEncoding method when drawing text (see Zend_Pdf_Resource_Font::encodeString()), primarily because it's a built-in encoding and doesn't require a lot of code to support.

Drawing characters outside the Latin-1 range requires several changes to the font program's resource dictionary. At best, a / Differences array must be added, and at worst -- especially for CJK fonts -- a CID font would need to be created. Neither approach is very easy at this point because of how Zend_Pdf_Resource objects create and manage their resource dictionary.

The CID fonts are what I am interested as my main need here is to add Japanese and Chinese support. There is a fairly straightforward way to do this in FPDF and links to examples on on the front page (http://fpdf.org/ ). Looking at the font, resource and dictionary code in Zend_Pdf I am having trouble seeing where exactly the problem is. Is seems that adding CID font support to Zend_Pdf wouldn't be much or any harder than adding it to FPDF. Can you elaborate on what the specific difficulty is? Perhaps though I just don't understand the code well enough. I think I might take a crack at adding CID font support here in the near future and see if I can't get it to work. Maybe then it will become clear to me what the specific difficulty is. Would you like to see my code if I do?

Let's assume for a moment that CID font support is complete and working. Now, someone is using it to create a PDF and calls $thisPage- >drawText(...) and specifies a UTF-8 encoding for the text. (Remember this is the future where CID font support exists and drawText don't convert everything down to Latin I anymore.) The text has Chinese characters in it. As far as I can see it is still the callers job to know the nature of the text being added before the call to drawText. If the current font is set to a Latin only font then the Chinese characters will show up as jibberish or not at all or an error will be thrown because the current font doesn't have glyphs for Chinese. So the caller must know that even though the text is unicode, it contains Chinese and so the font must be switched to a font that has the Chinese glyphs (a CID font).

Is that your intention or do you plan on actually including in Zend_Pdf the ability to just add unicode text and have it pick an appropriate font and switch to it, then convert the text to the encoding of the font and then add the text. It seems like this functionality would make more sense in one of the upper layers that you spoke of above. Wherever it goes though the real tricky issue in my mind is as follows: You have a form that a user of your software fills out. Your site handles and stores all text as UTF-8. Somewhere you've got to decide what language it's in and what font it needs. This is made particularly difficult by ambiguous text (for example some Japanese Kanji might look just some Traditional Chinese), and mixed text (lets say you've got some Thai, Japanese, and Korean all in the same sentence). For the latter problem it seems like you would need to split it up into multiple text runs and switch fonts each time the language changes. That definitely seems like it belongs in an upper layer. And in that case Zend_Pdf need only add support for adding and using CID fonts while the upper layers handle the rest.

I began the work necessary for full Unicode support in the framework last year with the introduction of TrueType fonts. I now have a paying project which will allow me to return to this code and complete it. I will have more to report in this area in the coming weeks.

I'll be interested to see what you come up with and to share my work with you. Are you planning on working on CID font support?


Just curious, if that function works correctly why has such a basic function as text measuring not been included in Zend_Pdf if the work has already been done.

All of the primitives exist in the framework, and you'll notice that ZF-313 is over a year old. Anyone who requires this support for their particular project can easily do the calculations themselves. The question comes up every two or three months and we generally direct those people to that function.

The primary reason such a function does not exist in the framework right now is support. The function I wrote for ZF-313 works, but does not do the calculation in the most efficient way possible, particularly if you're trying to use it to determine where to break a line (it's a Shlemiel the painter's algorithm -- http://www.joelonsoftware.com/articles/fog0000000319.html) . Furthermore, once a function appears in the framework, there are backwards-compatibility issues that must be considered.

Since 95% of the people who ask the question, "How do I measure a line of text?" are really asking, "How do I wrap the long lines of a paragraph?" or "How do I center this title?", creating and supporting a string measurement function is the wrong problem for the framework to solve.

Instead, the framework should provide layout classes that make it drop-dead simple to wrap long lines of text and center titles on a page. This is what I've been focusing on. My goal is to not have to create a text measurement function for the framework because the layout classes provided make it unnecessary.

Yes, now that I look at it more closely and think about my text wrapping algorithm it will end up being MUCH faster to use those lower level functions. Thanks for the explanation.

Well now I've got to decide on how to prioritize. I want to keep making progress in cleaning up and publishing my XML/DOM code but the unicode / CID font work is much more pressing for my business concerns (ie it's paid). So I think I'll try working on both for a while and see what happens.

Thanks again for all the info, I hope our current work overlaps so we can help each other out.

Rick

Reply via email to