Hello.
Here is my point of view about how translations should work.
I noticed that at least 3 persons interested in this.
So i want to discuss subj before any implementation,
because i realy want it to be usefull for wide group of ppl.
Plase excuse me if im not enough clean, and for being long.
Purpose
Generally, translation module should provide a way to
properly display characters used in irc messqages,
recieved in different charsets, different encoding, or
missed in users fonts.
And IMHO it should be done in epic core for efficiency.
Problems
One problem is to present symbols, missed in console codepage or font.
I see 2 solutions here:
- fallback mappings (like in linux console):
presenting missed symbols with similar letters.
egg ukrainian 'i' with latin 'i', latin-2 'o umlaut' with 'o'.
- transliteration:
transcripting missed symbols in other letters usually phonetically or with
RFC1345 mnemonics.
egg cyrillic in latin transcription, kanji in kana, 'o umlaut' with ":o"
Other problem is in that some encodings are variable-width,
egg utf8, BIG5 kanji (1-2 bytes/symbol), EUC-* (1-4 chars)
and some strange transcription Japan ppl use in IRC
to encode kanji into ascii - i dont know what is it.
For such encodings it is almost imposible to make direct translation maps,
like for 8bit charsets.
I want however to make translations enough flexible to handle
any kind of encoding used in IRC, and to allow to define
additional ways of fallback/transliteration mapping.
My suggestions
Specify <encoding> as <type>/<map>
where <type> defines how characters represented in stream:
7bit, 8bit, 16bit, utf8 or something other
<map> defines how symbol/letters presented with chars,
it is actually codepage name for 8bit encodings.
<map> parameter also shoud define range of characters,
presented by charset or whatever - to make desision if mapping filters
should be applied. For utf8 <map> is only to define charset range.
Encoding <type> actually can be defined from charmap definition.
Specify <filter> as <type>/<map> again,
where <type> is 'translit' or 'fallback' or something other,
and <map> is actual maping. Mapping tables can be loaded the same
way as charmaps and loaded when building translation map.
For each type of encoding and filtering, there should be
loading and mapping functions in core, which can load specific
map from files on demand and are to be used fot building
direct transl. map or be called for actual message translation.
Now, there should be global and per-window variable TRANSLATION:
translation := encoding filters...
encoding defines terminal or window encoding,
filters is list of filter names to be applied.
filters := >flist - apply to convert FROM given enc.
<flist - apply to convert TO given enc.
flist := flist,filter - apply filter if other filters fail
filter := filtname
filter+filter - apply sequentially
Egg:
To display somehow various windows-* symbols, missed in koi8r in console,
ukrainian and bielorusian letters as their equiv in russian,
and convert latin-1 and latin-2 umlauts as ":o",":a",etc :
TRANSLATION = koi8r <fb/win,fb/cyr,tr/latin
If russian text should be sent to ascii-only channel
(but i want still to type it in cyrillic):
WIN TRANS = ASCII <tr/cyr
To display kanji characters from SJIS channel by their presentation
in kana, in russian phonetic transcription and to send my russian text,
phonetically converted to kana if possible, or in latin:
WIN TRANS = SJIS >tr/kanji2kana+tr/kana2cyr <tr/cyr2kana,tr/cyr2lat
Finally, there should be some variables like
TRANS_REPL_CHAR <string> - to replace chars still undisplayble
TRANS_REPL_STR <string> - to replace string of such chars,
and do not highlighting for every char.
Seems that's all for now.
--
qMax
_______________________________________________
List mailing list
[EMAIL PROTECTED]
http://epicsol.org/mailman/listinfo/list