Re: [PATCH] console UTF-8 fixes

Jan Engelhardt Sat, 07 Apr 2007 04:03:47 -0700

Hi,


I just wanted to give my opinion on things...

(and enable utf8 to read this properly)

On Apr 7 2007 11:24, Egmont Koblinger wrote:
>
>> I strongly disagree.  First of all, you're changing the semantics of a 
>> 13-year-old API.  The semantics of the Linux console is that by 
>> specifying U+FFFD SUBSTITUTION GLYPH in your unicode table, you have 
>> specified the fallback glyph.
>
>OK, I'm not against using U+FFFD for missing glyphs. In the mean time I
>think it's still a good idea to clearly separate the two cases in the code
>(that is, the case of invalid sequence from the case of missing glyph), but
>we can still use the same replacement character in these two cases. I'll
>send an updated patch after Easter if it sounds good for you.

I am quite ok with the way things are right now.

 - vc displays <?> for illegal sequences

 - vc displays e.g. "U" (latin capital U) in place when Û (latin capital
   U with accent circumflex) is not available in this font 
   (determined by the unicodemap) (I do use an unicode map, because I
   use a 4096-byte cp437 "DOS" font which requires one)

 - vc displays <?> for sequences it does not know how to print

 - xterm displays <?> for illegal sequences

 - xterm seems to display <?> on undefined glyphs (U+DFFF for ex.,
   using the "Unicode Best" font from the xterm menu)

 - xterm seems to display nothing on undefined glyphs (U+E000 for ex.,
   "Unicode Best" again)

>> What's worse, you've hard-coded the uses of specific visual 
>> representations.  That is completely unacceptable.
>
>Now that we've dropped the idea of "dot" for missing glyphs, the other thing
>
>[...]
>
>Sorry, I wasn't clear enough and I think you misunderstood me. The symbol I
>choose for fallback is still '?' (the ASCII question mark), I just invert
>the color attributes of the cell where this is printed. This way it becomes
>visually distinguisable from the literal question mark. Using the current
>kernel you just cannot know whether the character printed is a real question
>mark, or a replacement glyph. Still, should you stongly disagree with this
>decision, the color inverting part can easily be removed.

Please, no dot, and no inverse color.
Imagine someone had the following bitmap for <unknown glyph/illegal sequence>:

################
################
################
####........####
##....####....##
##....####....##
########....####
######....######
######....######
################
######....######
######....######
################
################
################
################

Then inverting that again would be susceptible to confusion with
the regular '?' at 0x3F. 

(cp437 for example maps unknown/illegal to 0xFD which happens to be the
block graphic '■', but YMMV depending on font.)

>I think I've (mostly) described it above. Set everything to UTF-8, load a
>latin2 font (containing 256 glyphs, e.g. "setfont lat2-16"), make an
>application print U+00FB (alt + numpad 251 is one trivial way), you'll see
>an "u with double accent", though the symbol to be displayed is "u with
>circumflex". This isn't present in the current font, so the replacement
>character should appear, not a different letter.

I blame your latin2 unicode map. (See above about 'Û'.)
It should perhaps display a regular 'u' if it cannot display 'û',
but definitely not 'ü' (which is not called a double accent, btw).

>> To be able to do CJK you need something like Kon anyway.  This feels 
>> like bloat.
>
>I don't want CJK support. All that I want is to be able to edit English
>words within a file that contains mixture of English and CJK, with a text
>editor like vim or joe.

+1 for this one :)

xterm## echo "韓国と日本にようこそ!" >/tmp/foobar.txt
vc## cat foobar.txt

currently gets things not so right, because multibyte characters are not
displayed with as many <?> as they are wide.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] console UTF-8 fixes

Reply via email to