On Nov 10, 2007, at 9:29 PM, Willie Alberty wrote:
On Nov 10, 2007, at 3:47 PM, Rick Gigger wrote:
A few questions:
1. Are you going to be using xsl-fo for your layout model?
No. XSL-FO does not provide enough fidelity for my needs. The text
model I am working on is based on the OpenStep (Cocoa) frameworks,
and has an attributed string class at its core. Attributes such as
font, color, size, kerning, line height, tab stops, etc. may be
applied to arbitrary ranges of characters within such strings.
...
In the end, Zend_Pdf will have three useful layers for document
layout: an FO interpreter for multi-page document flows, the lower-
level layout classes for drawing attributed strings with finer-
grained control, and the existing drawing primitives already in
Zend_Pdf for precise glyph-by-glyph placement.
I wish I had time to just wait for it to all be completed. :) I have
implemented something that is not nearly as robust but that more or
less handles all of my current needs and is working now. I will be
interested to see what you think as I clean up the code and release
for public use / comments.
2. Does your revised font code involve support for unicode and/or
non Latin-1 fonts?
The font code in Zend_Pdf can already handle Unicode character sets.
All fonts used by the framework, whether the standard 14 PDF fonts,
or custom OpenType fonts, extract and use a Unicode character-to-
glyph map for drawing.
The reason that Zend_Pdf cannot currently draw non-Latin-1 strings
lies in the text encoding method used when drawing the glyphs on the
page. Right now, we've hard-coded using the WinAnsiEncoding method
when drawing text (see Zend_Pdf_Resource_Font::encodeString()),
primarily because it's a built-in encoding and doesn't require a lot
of code to support.
Drawing characters outside the Latin-1 range requires several
changes to the font program's resource dictionary. At best, a /
Differences array must be added, and at worst -- especially for CJK
fonts -- a CID font would need to be created. Neither approach is
very easy at this point because of how Zend_Pdf_Resource objects
create and manage their resource dictionary.
The CID fonts are what I am interested as my main need here is to add
Japanese and Chinese support. There is a fairly straightforward way
to do this in FPDF and links to examples on on the front page (http://fpdf.org/
). Looking at the font, resource and dictionary code in Zend_Pdf I am
having trouble seeing where exactly the problem is. Is seems that
adding CID font support to Zend_Pdf wouldn't be much or any harder
than adding it to FPDF. Can you elaborate on what the specific
difficulty is? Perhaps though I just don't understand the code well
enough. I think I might take a crack at adding CID font support here
in the near future and see if I can't get it to work. Maybe then it
will become clear to me what the specific difficulty is. Would you
like to see my code if I do?
Let's assume for a moment that CID font support is complete and
working. Now, someone is using it to create a PDF and calls $thisPage-
>drawText(...) and specifies a UTF-8 encoding for the text. (Remember
this is the future where CID font support exists and drawText don't
convert everything down to Latin I anymore.) The text has Chinese
characters in it. As far as I can see it is still the callers job to
know the nature of the text being added before the call to drawText.
If the current font is set to a Latin only font then the Chinese
characters will show up as jibberish or not at all or an error will be
thrown because the current font doesn't have glyphs for Chinese. So
the caller must know that even though the text is unicode, it contains
Chinese and so the font must be switched to a font that has the
Chinese glyphs (a CID font).
Is that your intention or do you plan on actually including in
Zend_Pdf the ability to just add unicode text and have it pick an
appropriate font and switch to it, then convert the text to the
encoding of the font and then add the text. It seems like this
functionality would make more sense in one of the upper layers that
you spoke of above. Wherever it goes though the real tricky issue in
my mind is as follows: You have a form that a user of your software
fills out. Your site handles and stores all text as UTF-8. Somewhere
you've got to decide what language it's in and what font it needs.
This is made particularly difficult by ambiguous text (for example
some Japanese Kanji might look just some Traditional Chinese), and
mixed text (lets say you've got some Thai, Japanese, and Korean all in
the same sentence). For the latter problem it seems like you would
need to split it up into multiple text runs and switch fonts each time
the language changes. That definitely seems like it belongs in an
upper layer. And in that case Zend_Pdf need only add support for
adding and using CID fonts while the upper layers handle the rest.
I began the work necessary for full Unicode support in the framework
last year with the introduction of TrueType fonts. I now have a
paying project which will allow me to return to this code and
complete it. I will have more to report in this area in the coming
weeks.
I'll be interested to see what you come up with and to share my work
with you. Are you planning on working on CID font support?
Just curious, if that function works correctly why has such a basic
function as text measuring not been included in Zend_Pdf if the
work has already been done.
All of the primitives exist in the framework, and you'll notice that
ZF-313 is over a year old. Anyone who requires this support for
their particular project can easily do the calculations themselves.
The question comes up every two or three months and we generally
direct those people to that function.
The primary reason such a function does not exist in the framework
right now is support. The function I wrote for ZF-313 works, but
does not do the calculation in the most efficient way possible,
particularly if you're trying to use it to determine where to break
a line (it's a Shlemiel the painter's algorithm -- http://www.joelonsoftware.com/articles/fog0000000319.html)
. Furthermore, once a function appears in the framework, there are
backwards-compatibility issues that must be considered.
Since 95% of the people who ask the question, "How do I measure a
line of text?" are really asking, "How do I wrap the long lines of a
paragraph?" or "How do I center this title?", creating and
supporting a string measurement function is the wrong problem for
the framework to solve.
Instead, the framework should provide layout classes that make it
drop-dead simple to wrap long lines of text and center titles on a
page. This is what I've been focusing on. My goal is to not have to
create a text measurement function for the framework because the
layout classes provided make it unnecessary.
Yes, now that I look at it more closely and think about my text
wrapping algorithm it will end up being MUCH faster to use those lower
level functions. Thanks for the explanation.
Well now I've got to decide on how to prioritize. I want to keep
making progress in cleaning up and publishing my XML/DOM code but the
unicode / CID font work is much more pressing for my business concerns
(ie it's paid). So I think I'll try working on both for a while and
see what happens.
Thanks again for all the info, I hope our current work overlaps so we
can help each other out.
Rick