Re: intelligent charset recognition for irc

Roman Czyborra Wed, 04 Oct 2000 04:57:08 -0700
Dear Martin:

> Well, on the irc channels I am on, iso 646 is still used, however
> it is typed manually by people without access to a Swedish
> keyboard. You are right that this is a crude hack, but there is no
> way to specify your charset on irc, unfortunately.

This is not true, IRC can already do better than that:

        /VERSION
        *** Client: ircII 4.4M (internal version 20000126)
        /HELP SET TRANSLATION
        *** Help on translation
        Usage: SET TRANSLATION <character translation table>
          The TRANSLATION variable defines a character translation
          table.  By default, ircII assumes that all text processed
          over the network is in the ISO 8859/1 map, also known as
          Latin-1.  This is identical to standard ASCII, except that
          it is extended with additional characters in the range
          128-255.  Many environments by default use the Latin-1 map,
          such as X Windows, MS Windows, AmigaDOS, and modern ANSI
          terminals including Digital VT200, VT300, VT400 series and
          MS-Kermit.  However, many older environments use non-standard
          extensions to ASCII, and yet others use 7-bit national
          replacement sets.

          Some available settings for the TRANSLATION variable:

          8-bit sets:
            HP_MCS              Hewlett Packard Extended Roman 8.
            MACINTOSH           Apple Macintosh computers and boat
                                anchors.
            CP437               Old IBM PC, compatibles and Atari ST.
            CP850               New IBM PC compatibles and IBM PS/2.
            CP850               New IBM PC compatibles and IBM PS/2.
            DEC_MCS             DEC Multinational Character Set.
                                VAX/VMS.  VT320's and other 8-bit
                                Digital terminals use this set by
                                default, but I recommend changing to
                                Latin-1 in the terminal Set-Up.
            DG_MCS              Data General Multinational Character Set.
            NEXT                NeXT.

          7-bit sets:
            ASCII               ANSI ASCII, ISO Reg. 006.  For American
                                terminals in 7-bit environments.  Default.
            DANISH              Norwegian/Danish.
            DUTCH               Dutch.
            FINNISH             Finnish.
            FRENCH              ISO French, ISO Reg. 025.
            FRENCH_CANADIAN     French in Canada.
            GERMAN              ISO German, ISO Reg. 021.
            IRV                 International Reference Version, ISO
                                Reg. 002.  For use pedantic in ISO 646
                                environments.
            ITALIAN             ISO Italian, ISO Reg. 015.
            JIS                 JIS ASCII, ISO Reg. 014.  Japanese
                                ASCII hybrid.
            NORWEGIAN_1         ISO Norwegian, Version 1, ISO Reg. 060.
            NORWEGIAN_2         ISO Norwegian, Version 2, ISO Reg. 061.
            POLISH              Converts windows codepage 1250 to ISO-8859-2
            POLISH_NOPL         Converts both cp1250 and iso8859-2 to latin
                                equivalents
            PORTUGUESE          ISO Portuguese, ISO Reg. 016.
            PORTUGUESE_COM      Portuguese on Digital terminals.
            RUSSIAN             Russian.
            RUSSIAN_ALT         Alternative Russian.
            RUSSIAN_WIN         Russian with Windows.
            SPANISH             ISO Spanish, ISO Reg. 017.
            SWEDISH             ISO Swedish, ISO Reg. 010.
            SWEDISH_NAMES       ISO Swedish for Names, ISO Reg. 011.
            SWEDISH_NAMES_COM   Swedish.  Digital, Hewlett Packard.
            SWISS               Swiss.
            UNITED_KINGDOM      ISO United Kingdom, ISO Reg. 004.
            UNITED_KINGDOM_COM  United Kingdom on DEC and HP terminals.
        See Also:
          DIGRAPH
          BIND ENTER_DIGRAPH

In http://czyborra.com/utf/#UTF-8 I wrote:

        PGP 5.0i and IRC II-4.4 still use Latin1 as their canonical
        text encoding instead of UTF-8: {cp850,ebcdic}_to_latin1
        in pgp-5.0i/src/lib/pgp/helper/pgpCharMap.c and
        ircii-4.4/source/translat.c 

They oughta move to UTF-8, though.

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: intelligent charset recognition for irc

Reply via email to