Sam Tregar <[EMAIL PROTECTED]> writes:

> As far as I know the character-set conversions are not necessary to
> achieve this goal, so they weren't included.

You are correct.  No general-purpose HTML quoting function handles
internationalization, for two reasons:

* It's not necessary to achieve the primary purpose of quoting, which
  is to prevent the HTML metacharacters to be interpreted as markup.

* It's extremely hard to implement without making simplistic
  assumptions.  Handling of I18N text is highly context-dependent.
  For example, it may seem "correct" to change the character 220 to
  "&Uuml;".  But if the target template is in a different charset,
  where 220 has a wholly different meaning?

  For example, in Latin 1, the character 169 is the copyright sign,
  with entities "&copy;" and "&#169;".  But in a Latin 2 HTML
  document, exactly the same code represents the "S with caron"
  character, with entities "&Scaron;" and "&#352".  In UTF-8, the same
  code is an illegal character.

  How is a quoting function to know whether to convert code 169 to
  "&copy;" or to "&Scaron;"?

A quoting function that tried to fully handle I18N would have to know
everything about charsets and HTML and the surrounding context.  Doing
that kind of work for no gain is pointless.  Doing the simple thing
and assuming Latin 1 is actually *harmful* for non-Latin 1 users.


-------------------------------------------------------
This sf.net email is sponsored by:
Access Your PC Securely with GoToMyPC. Try Free Now
https://www.gotomypc.com/s/OSND/DD
_______________________________________________
Html-template-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/html-template-users

Reply via email to