Re: [Koha-devel] Diacriticals, Unicode, and PDF's

Chris Nighswonger Mon, 28 Sep 2009 18:22:20 -0700

Hi Mason,

On Mon, Sep 28, 2009 at 7:40 PM, Mason James
<mason.loves.su...@gmail.com> wrote:
>
> so - i'm curious... is there a newer/better way to get around the
> less-than-perfect character-conversion issues with UTF to PDF, that were
> discussed on the lists in the last year or so (approx)


The UTF to PDF conversion issue appears to be primarily caused by the
fact that the PDF stream uses glyphIDs rather than unicode to display
strings. Thus there is not a direct, one-to-one unicode-gliphID
relationship. The reason that *some* unicode chars come across ok is
more ascribable to chance than to design. This happens when the
unicode *happens* to match the font gliphID. What really should be
happening is that there should be a "ToUnicode" table built and
embedded in the PDF file so that the relationship from unicode to
gliphID may be properly defined.

Logically, the next question is: How is this to be accomplished?

The answer is: I have no concrete idea atm.

I *think* that the first issue at hand is that the "standard 14 fonts"
do not extend far enough into the unicode char set to be usable
afaict. So we will need to use fonts which do. (ie. gnu freefonts
http://www.gnu.org/software/freefont/)

The second issue is to understand how ISO32000-1 defines building a
ToUnicode CMap (sect 9.10.3) and grind out some code to construct
these (probably more modifications to PDF::Reuse: I have made a number
already which the maintainer has agreed to include in the next release
toward the end of October). It may be as simple as embedding unicode
ttf's in the PDF file. If that is the case, the code for that is
already in place in both PDF::Reuse and PDF::API2. I'm not convinced
that the solution is anywhere near that simple or it would have been
done by now.

But this is all somewhat subject to sudden and dramatic change as I'm
still very much on the learning PDF learning curve and could be way
off target.

I have had some correspondence with an individual who is a platform
architect at Adobe and who has kindly offered to help clarify any
questions regarding unicode and PDF.

Any thoughts, information, suggestions, etc. is most gratefully appreciated.

Kind Regards,
Chris
_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha.org
http://lists.koha.org/mailman/listinfo/koha-devel

Re: [Koha-devel] Diacriticals, Unicode, and PDF's

Reply via email to