Re: Special Characters

Joerg Pietschmann 30 Dec 2002 21:19:23 -0000

On Monday 30 December 2002 16:55, you wrote:
> My problem is that sometimes FOP creates an invalid PDF file
> because of a special character (such the Euro sign, for example).  Each
> time this happens, I look for that character in my code and replace it with
> an equivalent character or the Unicode representation.
>
> My question is, is there a known list of characters which will cause FOP to
> fail?


How do you define an "invalid PDF file"? Does Acrobat Reader refuse
to open such a file, does FOP raise an error or does the file just contain
an "#" or other unexpected glyphs?

Firstly, you should probably learn about character encodings in XML.
Each XML file starts with an XML declaration:
  <?xml version="1.0"?>
The declaration above means the file is UTF-8 encoded. You can't
for example edit this file with Windows Notepad and enter an Euro
sign (unless you have W2K or later and use "save as UTF-8").
You can put an encoding into the XML declaration, like
  <?xml version="1.0" encoding="ISO-8859-1"?>
which means the file is ISO-8859-1 or Latin-1 encoded, the most
used character encoding. However, you still can't dit this file with
Windows Notepad and enter an Euro sign, because Windows usually
uses an extension of ISO-8859-1 called CP1252 or something similar,
and contrary to the example above, FOP will not catch this and simply
output a "#".
If in doubt, restrict yourself to the ASCII character range and enter each
Unicode character outside this range as XML character reference. The
Euro sign is for example &#x20AC; (including the semicolon). See
 http://www.unicode.org/charts/charindex.html
for a lookup table.
Alternatively, get an XML aware editor and use it exclusively.
Look also into the XML spec for more info:
  http://www.w3.org/TR/REC-xml

The second problem is that the standard fonts don't contain glyphs for most
Unicode characters. Helvetica (Arial), Times and Courier have only glyphs
for most of the characters from the ISO-8859-1 range (and a few characters 
outside this range), Symbol and Zapf Dingbats cover other ranges. See the
fonts.fo file in the FOP example directory tree for details. You can render
this file to PDF and see all characters supported out of the box. If there is
no glyph in the font for the required character, FOP uses "#" or the
equivalent in the font's encoding (it is some other glyph for the Symbol and
the Dingbat font).
For all other characters, you have to install a user font containing glyphs
for the characters you want to display. See docs/html-docs/font.html for more
info about this topic.

J.Pietschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Special Characters

Reply via email to