I don't think the language really come into consideration for this display 
problem... at least not from what I have seen since then. 

Forwarded to the list. 

Jehan 

 ---------- Forwarded message ----------
Message-ID: 
<[EMAIL PROTECTED] 
xe139.bisx.prod.on.blackberry>
To: "jehan" <[EMAIL PROTECTED]>
Subject: Re: [Materm-devel] Fonts and all these stuffs
From: [EMAIL PROTECTED]
Date: Thu, 9 Oct 2008 15:18:31 +0000 

Maybe the language you defined is a hint of how it should be displayed, like 
en_US.UTF-8 and zh_CN.UTF-8? 

Sorry I forget to forword to the list. Please do forword it. 

Sent from my Verizon Wireless BlackBerry 

 -----Original Message-----
From: "jehan" <[EMAIL PROTECTED]> 

Date: Thu, 09 Oct 2008 16:51:14
To: <[EMAIL PROTECTED]>
Subject: Re: [Materm-devel] Fonts and all these stuffs 


Hi, 

after some searches, I found where your "bigger" characters come from. There
is a specific plane called "Halfwidth and Fullwidth forms":
http://www.unicode.org/charts/PDF/UFF00.pdf 

They are the same characters as some other (common) character, same
semantic, but different display: either one column instead of 2 (halfwidth),
either 2 instead of 1 (fullwidth). But the list is very small. In my
language for instance (French), I could not mix it with Japanese because
many characters are missing ("é", "è", etc.). And the '€' is not in this
list either. At the opposite there are very few "2-column" character with a
"1-column" version...
The full list of both half and full chars is either in the pdf, on in
wikipedia: http://en.wikipedia.org/wiki/Halfwidth_and_Fullwidth_Forms 

So I guess that they somehow consider the display width (in term of "number
of column") in the Unicode list, though they don't seem to precise it in all
the docs I find on the website. So which size did they expect the '€'
character be at the beginning? Are there other characters which have the
same problem of different size (compared to other chars) depending on the
font? Etc. 

Anyway that's definitely not the solution to our problem. 

I will subscribe to the Unicode mailing list (probably tonight or this
week-end) to ask more information...
See you. 

Jehan 

P.S.: may I forward this discussion to the mrxvt ml? I think these are
useful information 

jehan writes: 

> Hi,   
> 
> unfortunately this is the same euro... simply because I got it from my 
> mrxvt branch with UTF-8, clicking the exact same touch and same locale (so 
> same keysym, then same byte encoding), the only thing changing being the 
> xft font I specified on the command line. In 'Bitstream Vera Sans' (it has 
> not much non-european character -- has it any? -- but it has euro at 
> least), the euro is one column. In 'Sazanami Gothic', or 'Mikachan' for 
> instance (2 japanese fonts), euro is 2 columns.   
> 
> Moreover when you read the full Unicode list:
> http://www.unicode.org/Public/UNIDATA/NamesList.txt   
> 
> There is only one euro sign (2 in fact, but the other is not the same, it 
> is the historical "ecu" sign from first European Union political projects, 
> so it will display another sign. Look page 189 of this Unicode chart, 
> there are both characters: 
> http://www.unicode.org/Public/5.1.0/charts/CodeCharts.pdf).   
> 
> Anyway I looked in the whole Unicode website, I cannot see any information 
> about column. I wanted to check if there is any "recommendation" of number 
> of columns depending on the character, but I cannot find any. I wanted to 
> know whether any of the font was violating some Unicode rules, but 
> apparently no. It looks the choice is implementation-free (not that this 
> changes really anything to mrxvt. In my opinion, we must rely on the 
> standards of course, but also work around the "reality" to have best 
> display when possible). Hence a "general" function like wcwidth may not be 
> really relevant in the end, as it does not take into account the actual 
> font used...
> But I am still searching the Unicode website, because some place speaks of 
> "fullwidth, halfwidth", etc. So maybe in some place there is a mention of 
> columns...   
> 
> Jehan   
> 
> Terminator writes:   
> 
>> Hi, Jehan,   
>> 
>> In UTF-8 terminal, the assumption that one column == one character is
>> definitely wrong. For example, a Chinese character typically consists
>> of more than one column because it is wider than English character.   
>> 
>> I think wcwidth may be the right function we want to use. Yes, you
>> raised an excellent issue regarding the euro character '€'. However,
>> are you sure that the character with two columns is indeed the same
>> character with just one column? I know that in Chinese, there are also
>> English characters, such as "abcdefghijklmn". 
>> Do
>> you notice the difference of these characters? They are as twice wide
>> as normal English characters! In fact, they are not the same characters
>> as English characters "abcdefghijklmn". If you take a look  at the raw
>> bytes of these characters, you will see the difference. I wonder if it is
>> the same situation for the euro character '€'. If this is  indeed the 
>> case,
>> then wcwidth IS the right function to get the column width of each
>> character, regardless of what the character looks like or what font is
>> being used.   
>> 
>> Just my 2 cents.   
>> 
>> On Wed, Oct 8, 2008 at 11:03 AM, Jehan <[EMAIL PROTECTED]> wrote:   
>> 
>>> Hi,   
>>> 
>>> Gautam Iyer a écrit :
>>> > I think all terminal emulators do this -- If you give them a
>>> > proportional font, then they treat it as one column = one character.
>>> > That is, you find the width of the widest character in the font, and
>>> > draw all characters that wide.
>>> >   
>>> 
>>> About this, I found this Debian doc:
>>> http://www.debian.org/doc/manuals/intro-i18n/ch-output.en.html   
>>> 
>>> Section 7.1.2 especially deals with this column issue. You even have X
>>> functions (man wcwidth) giving the number of column. But this function
>>> is not what I need, for several reasons. One reason is that it does not
>>> take into account the used font, though in my tests, I noticed that
>>> characters have a different "number of column" depending on the font!   
>>> 
>>> For instance I have said that the euro character '€' is displayed on 
>>> one
>>> single column in my previous email. But since then, I have found some
>>> fonts where it was on 2 columns!   
>>> 
>>> Note that the document also made a statement close to mine, which is we
>>> should absolutely not compare the number of bytes and columns. These are
>>> 2 different things which are not related! So it confirms the current
>>> multicolumn implementation is wrong (for many cases, it is not, but not
>>> as a "generality").   
>>> 
>>> Anyway it looks like I should not assume 1 col = 1 char. Moreover I
>>> don't think the idea of taking the width of the widest character is so
>>> good, because it makes text with one-column character very ugly and
>>> difficultly readable (a lot of space between each character is not
>>> good). It is probably better to simply compute each character size in
>>> term of column. I will probably use a pattern as the size of the column.
>>> Like '@' will decide of the size of a column (following advice on the
>>> link from my previous email). This will be a basis pattern. All other
>>> character's number of column will be computed from this statement. This
>>> way it won't be any issue for curse application, I think.
>>> At least I can try and see what it renders.   
>>> 
>>> 
>>> Note that it seems you too once thought it was not a good idea to
>>> consider the column size as the biggest character (so having a lot of
>>> space for all smaller characters). From main.c:   
>>> 
>>> /*
>>> * 2006-01-26 gi1242: Monospaced fonts seem a good idea.
>>> * 2006-01-27 gi1242: Maybe not such a good idea. When we ask for a
>>> * monospaced font from a propotionally spaced font, we just get the same
>>> * old prop font, with a bigass textwidth. That's no use to us. If it
>>> * returned the closest matching mono-spaced font, then that would be
>>> * useful.
>>> */   
>>> 
>>> 
>>> Jehan   
>>> 
>>> 
>>> ------------------------------------------------------------------------ 
>>> -
>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>> challenge
>>> Build the coolest Linux based applications with Moblin SDK & win great
>>> prizes
>>> Grand prize is a trip for two to an Open Source event anywhere in the 
>>> world
>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>>_______________________________________________
>>> Materm-devel mailing list
>>> Materm-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/materm-devel
>>> mrxvt home page: http://materm.sourceforge.net   
>>> 

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Materm-devel mailing list
Materm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/materm-devel
mrxvt home page: http://materm.sourceforge.net

Reply via email to