I'm working on the code. In the mean time, here is the code for calculating display width:
First you need to save the text file at http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt =============================================== require 'regex jfiles' t=: 1!:1 <'EastAsianWidth.txt' point=:'^([0-9A-F]{4});(Na|N|H|A|W|F)' rxmatches t range=:'([0-9A-F]{1,4})\.\.([0-9A-F]{1,4});(Na|N|H|A|W|F)' rxmatches t jcreate 'unidatapoint' (< }."1 point rxfrom t) jappend 'unidatapoint' jcreate 'unidatarange' (< }."1 range rxfrom t) jappend 'unidatarange' =============================================== Now you have unidatapoint.ijf and unidatarange.ijf and are able to use them. =============================================== require 'jfiles' NB. N : half NB. Na : half NB. H : half NB. A : half NB. F : full NB. W : full widthcode=:;: 'N Na H A F W' pod=:>jread 'unidatapoint';0 rad=:>jread 'unidatarange';0 towc=: widthcode&i. NB. towidthcode dfh=. 16&#. @ ('0123456789ABCDEF'&i.) po=:(dfh each {."1 pod),. <"0 towc"0 {:"1 pod ra=:(,&.>/"1 dfh each 2&{."1 rad),. <"0 towc"0 {:"1 rad poa=:>{."1 po fill=: 4 : 0 'r c'=.x r=. ({.r)+ i. >: -~/ r ({.c) r}y ) tab=:65536$0 NB. missing is N tab=:(> {:"1 po) poa} tab tab=:>./ ra fill"1 tab diswid=: [: >: [: 4&<: [: {&tab 3&u:@ucp NB.for rank 1 ================================================ For performance improvement, you could save tab using jfile and use it. Also, you could use more compact representation(using 3 bits to represent each character and compress the data). Usage Example: diswid '한글ab!─' 2 2 1 1 1 1 (,:~ ((ucp'-') $~ +/@diswid)) ucp '한글ab!-' NB. properly showing the top line in fixed-pitch font -------- 한글ab!- 2007/2/13, Eric Iverson <[EMAIL PROTECTED]>:
The problem of proper display of boxed unicode data is an interesting one. The first step to getting this fixed is for someone to provide a working J model that takes an arbitrary boxed argument and produces the character stream that properly displays it. If we had such a model we might consider incorporating it into the JE. ----- Original Message ----- From: "June Kim" <[EMAIL PROTECTED]> To: "General forum" <[email protected]> Sent: Sunday, February 11, 2007 5:11 AM Subject: Re: [Jgeneral] wd 'set ...' with box draw characters > 2007/2/11, Chris Burke <[EMAIL PROTECTED]>: >> June Kim wrote: > [snip] >> > Second, the box is broken with different width characters(that is, >> > when the length of bytes of the encoding, and the width of the >> > characters on display don't match). What is the usual way of >> > solving >> > it in other programming languages? There is a unicode standard for >> > character widths. http://unicode.org/reports/tr11/ >> > >> > Python implements that standard(along with others) in unicodedata >> > module. >> > >> >>>> unicodedata.east_asian_width(u'한') >> > 'W' >> >>>> unicodedata.east_asian_width(u'a') >> > 'Na' >> > >> > (u specifies the following string is unicode. east_asian_width >> > returns >> > the width of the character, not only for east asian characters but >> > all >> > unicode characters; it's got a narrow name due to its history) >> > > [snip] >> >> If you are having problems with display, it is because of the font, >> not >> because we are not using unicode. > [snip] > > When a string is boxed and the string includes characters that have > different width to the byte lenghts, then the box is broken in J. It > is not because of the font. It is because J makes an assumption that > every character's width is same with its byte length, which is > obviously false in many writting+encoding systems, including east > asians. We can definitely say J's box display isn't internationalized > yet. > > For example, 54620 (in unicode code point) is a Korean character, > which is pronounced as "han". It's width is "Wide"(twice wide as latin > alphabets) > > han=.4 u: 54620 > <han > +---+ > |한| > +---+ > <8 u: han > +---+ > |한| > +---+ > > Since J counts the byte length for determining character's width, and > the byte length for han is 3 in UTF-8( 3-: #8 u: han ), the box's > horizontal character '-'(of which width is "Narrow") is printed three > times, and on the display the box is broken. > -------------------------------------------------------------------------------- > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
