Hi,

I should correct some statements I made before.  This draws mostly
from the PDF reference, v. 1.7.

I maintain that the relation between glyphs and Unicode sequences
supposed by the AGLFN is a bad idea and essentially broken.  That
said, however:

1) My reference to "embedding Unicode text" seems to have been a trick
of my imagination.  It seems to be a conflation of the PDF technology
of embedded fonts with that of embedded documents.  However, after
looking for several evenings now,  I can't find anything like the idea
that seemed so clear to me:  Unicode text embedded in PDF with
references from positioned glyphs into that text.  (I didn't *think* I
was making this up -- I seemed vividly to remember reading about it,
and even doing it! Disturbing, yes.)

Sure, embedded text would be a working solution, but it seems not to
have been implemented at all in current PDF technologies.

2) While the existing PDF "ToUnicode" mapping is incapable of
reproducing the original text, and while it may mangle text in complex
scripts such as Indic ones, it seems to be the *only* technology
existing for extracting text from a PDF document.

Furthermore, in most cases, for simple alphabetic scripts anyway,  the
text produced by the "ToUnicode" mapping would  usually be meaningful.
 For more complex scripts, the worst-case scenario occurs, but rarely:
most sentences produced would be readable.

That is, if the whole system really works as advertized.

3) The PDF "ToUnicode" stream could or should be produced by the font
layout engine based on font table entries.  Evidently the incorrect,
restrictive and ugly AGLFN is the way current  PDF software is
supposed to get the info to populate the "ToUnicode" entries.

I'm still working on a summary of the issues and technologies.

The AGLFN (unfortunate though it is) represents the only way currently
proposed to effect the secondary, but important, function of copying
text from a rendered PDF document. So I'm now working on a way to
apply AGLFN names to FreeFont auxiliary glyphs at build time.

The next question is:
    Does it work for our users?
If the font's auxiliary glyph names are made as those specified by
Adobe, and standard Unix/Linux tools are used to create a PDF with the
font embedded, will the auxiliary glyphs be (somehow) converted to
text?

Reply via email to