This is a discussion that is strictly academic at this time.  Please do not
think that we are close to supporting any of this stuff.  Rather, I want to
start a dialog about how people want this to work, so when we do go ahead
and design it, people will not think it came out of left field.

So the question on the table is string encodings. 

The input line: Now right now, epic doesn't handle encoding on the input line 
-- it just assumes that each byte is one code point.  For people using utf-8 
one keypress may yield one codepoint may yield multiple bytes, which show up
as multiple (incorrect) bytes in the input line rather than the key pressed.
Column counting is not broken /as such/.

The display: Right now, epic doesn't handle encoding on the output display.
Any bytes received are just sent to the display, so if you output a utf-8
string on an utf-8 emulator, it will show up correctly, and if you output 
a utf-8 string on a iso-8859-* emulator, it will yield multiple (incorrect)
characters.  Column counting is (of course) broken.

The servers: Globally, the user can /set translation which converts between
the code points from one 8-bit character set (usually ascii) into another 
8-bit character set that the server is using.  This is fine, as long as 
both the user and the server are using 8-bit code points (which is not the
case for utf-8, obviously).

Channel Names: Channel names can be encoded in any encoding.  A channel
name like #frãnd could be encoded in iso-8859-1 and take up 6 bytes,
or the channel name could be encoded in utf-8 and take up 7 bytes.  The
irc server will treat these as separate channels, ***so it's fundamentally
important to be able to specify an encoding when specifying a channel name.***

Channel messages: People who chat on the channel may (or may not) use any
encoding at any time, but usually everyone uses the same encoding, which
***may or may not be the same encoding as the channel name itself***.  
For example, the channel name may be encoded in iso-8859-1 and the users
may agree to use utf-8.  ***so it's fundamentally important to be able to 
specify a different encoding for privmsgs on the channel than is used to
specify the encoding of the channel name itself.***

THEREFORE,
We're going to have to start thinking about syntax for how to specify 
all this stuff on a per-channel, per-server basis.  As a wild example,
we could prefix channel names with encoding, using invalid-for-channel
characters.

Example:
        /join (iso-8859-1)#frönd
(join the channel, encoding the channel name in iso-8859-1)

        /join (utf-8)#frönd)
(join the channel, encoding the channel name in utf-8)

        /join (iso--8859-1/utf-8)#frönd
(the channel name is encoded in iso-8859-1, but privmsgs will be encoded
in utf-8)

The last thing I want to do is support utf-8 but then end up having it 
be half-assed and make everyone think i'm a clod for not thinking of 
every last important detail to take care of.  So now is the time to tell 
me what's really important for supporting a multi-encoding irc client!

Thanks for your discussion!
Jeremy
_______________________________________________
List mailing list
List@epicsol.org
http://epicsol.org/mailman/listinfo/list

Reply via email to