[Moving over to fop-dev as this is getting technical]
On 30/01/13 15:58, Glenn Adams wrote:
> On Wed, Jan 30, 2013 at 6:44 AM, Neeraj <neerajii...@gmail.com> wrote:
>> Yes, my editor can handle used font.
>> If you highlight the text in the editor and set the font to Arial do you
>> see any
>> glyph? For PDF text - No
>> For embedding this, May be I added embedding mode full later, after
>> PDF, but in both the cases it is giving same results.
>> The issue I reported was for non-Base14 font. You are using Arial which is
>> Base14 font and FOP has full support for these kinds of fonts.
>> Well as you said, I tried same functionality with Arial font also and
>> found same
>> issue in different form.
>> Original Arabic text - هذا تعليق الاختبار. تتم كتابة الكلمات بشكل صحيح
>> PDF Arabic text - ھذا تعلیق الاختبار. تتم كتابة الكلمات بشكل صحیح
>> If I compare PDF and MS-Word files, it looks exactly similar but when I
>> copy it
>> to an editor(Font supported), the words look different (Glyphs are
>> missing). You
>> can check the above text.
>> Why am I loosing text while doing copy/paste?
> One thing to keep in mind is that some fonts do not include entries in the
> CMAP table for all glyphs that can be referenced by performing the
> character to glyph transformation process. In this case FOP, synthesizes a
> CMAP entry which is used in the embedded font, where this entry uses a
> dynamically generated Unicode value in the PUA (private use area). This
> latter is necessary since PDF requires specifying *some* character code
> (and not glyph index directly) when performing text drawing.
I may be missing something, but I don’t understand this ‘PDF requires
specifying some character code’. AFAIU you can put glyph indices
directly in the PDF string; you just have to specify Identity-H as the
font’s encoding and Identity in the CIDToGIDMap. So I’m not sure why it
is necessary to use codes in the private use area.
Then, to have copy-paste working, you ‘just’ have to provide an
appropriate ToUnicode CMap, that re-maps the shaped glyph to the
original Unicode code point(s).
> If you then attempt to copy this text and paste into another editor that
> isn't aware of this dynamic mapping using the embedded font's CMAP, then
> you may lose that mapping information. One possible way to fix this, which
> I haven't investigated in detail, is to provide a separately encoding
> Unicode string that contains the original, pre-transformed text, and
> associate this string with the displayed post-transformed character string
> that may contain these dynamic PUA characters. The PDF viewer would then
> need to make use of the pre-transformed string when performing copy
> operations. However, I haven't researched this to see if PDF supports.
> Anyway, I suspect this is what is causing your problem. I've opened a bug
> on this at .
>  https://issues.apache.org/jira/browse/FOP-2204