Joaqu�n,

This is essentially a font problem.  The canonical Unicode encodings for 
various quote characters include:

#define UCS_LQUOTE                      ((UT_UCSChar)0x2018)
#define UCS_RQUOTE                      ((UT_UCSChar)0x2019)
#define UCS_LDBLQUOTE           ((UT_UCSChar)0x201c)
#define UCS_RDBLQUOTE           ((UT_UCSChar)0x201d)

We're currently importing these as-is from Word, which is the correct 
behavior.  These are *very* common characters, so on platforms which have 
Unicode-aware fonts, everything Just Works.  

The trouble starts when using fonts which do *not* have entries at a 
given codepoint.  In this situation, depending on your platform, the font 
renderer will draw a slug or whitespace or whatever for each unmatched 
character.  

I'm not totally up to speed on the state of the Unix font-handling code, but 
I'm pretty sure that most (or all) of the Unix fonts we're using do *not* 
have entries at these codepoints, which would explain the behavior you're 
reporting. 

There are two categories of valid solutions to this kind of problem:

1.  Fix the fonts.  
==================
Over the long term, this is clearly the ideal solution.  It would be a 
*very* useful project to remap existing fonts so the glyphs are indexed to 
the correct Unicode codepoints (instead of whatever charset they're 
currently encoded in).  

Unlike attempts to create *new* Unicode fonts, there's no need for 
typographic skills here.  Basically, you'd just need code which knew how to: 

  - open up a font file, 
  - recognize the current charset / encoding, 
  - remap the index for each code point (probably using libiconv), 
  - and then save the "Unicoded" font back out to a new file,
  - being sure to update the charset indicator for that new font.  ;-) 

I haven't tried to do this, but I suspect the biggest obstacle here would 
probably be the IP issues, if any.  

2.  Add code workarounds to let people use "broken" fonts.  
==========================================================
However, in the mean time, people may want to investigate font-substitution 
and character-substitution tricks in our text measurement and rendering 
code.  Essentially what you'd be doing here would be recognizing cases where 
a Unicode character couldn't get rendered in a given font, and instead using 
either:

  - the same entry point in a different font, or
  - a different codepoint in the same font.

If anyone's interested in going down this path, remember that the code 
needed should probably be isolated either in the Graphics layer, or even in 
the underlying platform APIs being called.  

Note, however, that this could start turning into a heck of a lot of code.  
Worse, the performance implications of doing extra work at measurement and 
drawing time can be severe for a GUI-intensive app.  

In essence, you'd be doing the same work as in step #1, but interactively, 
instead of once per font.  

Paul




Reply via email to