[abcusers] Unicode and UTF8?

Steven Bennett Wed, 23 Jul 2003 10:22:18 -0700

Has anyone given any thought to supporting Unicode and/or UTF8 files in the
new standard?  Much of my text editing nowadays (I work on Mac OS X...)
produces Unicode encoded text files by default, and I think this is going to
become more commonplace over the next few years.  Also, supporting Unicode
would open ABC up to more foreign languages.


To accomplish this in a parser, I would actually convert the file to Unicode
(if not already) at the very beginning and parse the whole thing in Unicode.

One minor change to the spec would be how we deal with special character and
continuation processing with the backslash -- I'd do that immediately after
everything is converted to Unicode, before any other processing of the file.
This results in some subtle changes in interpretation, which means a line
like:

    \101: Steve Bennett

...would now be interpreted the same as:

    A: Steve Bennett

...which I think is a behavioral change for existing programs, but shouldn't
have much of a side effect otherwise.

(I'm not sure how to deal with the "\-", which inserts a hard hyphen in a W:
or w: field -- maybe it's the only thing not translated, or maybe it gets
converted to a Unicode 00AD, which is a Soft Hyphen, although that's kind of
reversed in meaning.  Or maybe Unicode 2011, which is a Non-Breaking Hyphen,
but again that's not *quite* the meaning...)

I also suggest adding a Unicode escape to allow just about any Unicode
character in an ASCII encoded file.  Something like:

    \U262F

...where "262F" is the hexadecimal value for the Unicode character to
insert, in this case a Peace symbol.

Of course, an ABC file already in Unicode form could allow any Unicode
character to be included in any text field, such as the Word fields, etc.

Any comments?

-->Steve Bennett

To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html

[abcusers] Unicode and UTF8?

Reply via email to