I'm working on the code.

In the mean time, here is the code for calculating display width:

First you need to save the text file at
http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt

===============================================
require 'regex jfiles'
t=: 1!:1 <'EastAsianWidth.txt'
point=:'^([0-9A-F]{4});(Na|N|H|A|W|F)' rxmatches t
range=:'([0-9A-F]{1,4})\.\.([0-9A-F]{1,4});(Na|N|H|A|W|F)' rxmatches t
jcreate 'unidatapoint'
(< }."1 point rxfrom t) jappend 'unidatapoint'
jcreate 'unidatarange'
(< }."1 range rxfrom t) jappend 'unidatarange'
===============================================

Now you have unidatapoint.ijf and unidatarange.ijf and are able to use them.

===============================================
require 'jfiles'

NB. N  : half
NB. Na : half
NB. H  : half
NB. A  : half
NB. F  : full
NB. W  : full

widthcode=:;: 'N Na H A F W'
pod=:>jread 'unidatapoint';0
rad=:>jread 'unidatarange';0

towc=: widthcode&i. NB. towidthcode

dfh=. 16&#. @ ('0123456789ABCDEF'&i.)
po=:(dfh each {."1 pod),. <"0 towc"0 {:"1 pod
ra=:(,&.>/"1 dfh each 2&{."1 rad),. <"0 towc"0 {:"1 rad
poa=:>{."1 po

fill=: 4 : 0
        'r c'=.x
        r=. ({.r)+ i. >: -~/ r
        ({.c) r}y
)

tab=:65536$0 NB. missing is N
tab=:(> {:"1 po) poa} tab
tab=:>./ ra fill"1 tab

diswid=: [: >: [: 4&<: [: {&tab 3&u:@ucp  NB.for rank 1
================================================
For performance improvement, you could save tab using jfile and use
it. Also, you could use more compact representation(using 3 bits to
represent each character and compress the data).

Usage Example:
  diswid '한글ab!─'
2 2 1 1 1 1
  (,:~ ((ucp'-') $~ +/@diswid)) ucp '한글ab!-'  NB. properly showing
the top line in fixed-pitch font
--------
한글ab!-



2007/2/13, Eric Iverson <[EMAIL PROTECTED]>:
The problem of proper display of boxed unicode data is an interesting
one. The first step to getting this fixed is for someone to provide a
working J model that takes an arbitrary boxed argument and produces the
character stream that properly displays it. If we had such a model we
might consider incorporating it into the JE.

----- Original Message -----
From: "June Kim" <[EMAIL PROTECTED]>
To: "General forum" <[email protected]>
Sent: Sunday, February 11, 2007 5:11 AM
Subject: Re: [Jgeneral] wd 'set ...' with box draw characters


> 2007/2/11, Chris Burke <[EMAIL PROTECTED]>:
>> June Kim wrote:
> [snip]
>> > Second, the box is broken with different width characters(that is,
>> > when the length of bytes of the encoding, and the width of the
>> > characters on display don't match). What is the usual way of
>> > solving
>> > it in other programming languages? There is a unicode standard for
>> > character widths. http://unicode.org/reports/tr11/
>> >
>> > Python implements that standard(along with others) in unicodedata
>> > module.
>> >
>> >>>> unicodedata.east_asian_width(u'한')
>> > 'W'
>> >>>> unicodedata.east_asian_width(u'a')
>> > 'Na'
>> >
>> > (u specifies the following string is unicode. east_asian_width
>> > returns
>> > the width of the character, not only for east asian characters but
>> > all
>> > unicode characters; it's got a narrow name due to its history)
>> >
> [snip]
>>
>> If you are having problems with display, it is because of the font,
>> not
>> because we are not using unicode.
> [snip]
>
> When a string is boxed and the string includes characters that have
> different width to the byte lenghts, then the box is broken in J. It
> is not because of the font. It is because J makes an assumption that
> every character's width is same with its byte length, which is
> obviously false in many writting+encoding systems, including east
> asians. We can definitely say J's box display isn't internationalized
> yet.
>
> For example, 54620 (in unicode code point) is a Korean character,
> which is pronounced as "han". It's width is "Wide"(twice wide as latin
> alphabets)
>
>   han=.4 u: 54620
>   <han
> +---+
> |한|
> +---+
>   <8 u: han
> +---+
> |한|
> +---+
>
> Since J counts the byte length for determining character's width, and
> the byte length for han is 3 in UTF-8( 3-: #8 u: han ), the box's
> horizontal character '-'(of which width is "Narrow") is printed three
> times, and on the display the box is broken.
>


--------------------------------------------------------------------------------


> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to