There are fonts that are used for fixed pitch presentation where: asci and accented European take 1 char display space and Asian take 2 char display spaces

The J boxed display routine assumes such a font and needs to calculate the width required for each box contents based on those display widths. That is, ascii char takes 1 display space, 2 byte UTF8 accented European takes 1 display space, and 3 byte UTF8 Asian takes 2 display spaces.

Although technically there are fancier rules I think for almost all practical cases the simple rule of 1 and 2 byte UTF8 chars take 1 display space and 3 byte UTF8 chars take 2 display spaces is adequate.

Code that handled this would allow proper display of UTF8 data with appropriate fixed pitch fonts.

----- Original Message ----- From: "Oleg Kobchenko" <[EMAIL PROTECTED]>
To: "General forum" <[email protected]>
Sent: Monday, February 12, 2007 11:48 PM
Subject: Re: [Jgeneral] wd 'set ...' with box draw characters


Looking at these East Asian characters in my
email client, IE browser, they are not rendered
as double width, but as a fractional width between 1 and 2
using Courier New font.

3 dashes: too narrow
+---+
|í.oê¸?â"?|
+---+

5 dashes: too wide
+-----+
|í.oê¸?â"?|
+-----+

Also currently J stubborly wants to draw the box
as if for a UTF-8 sequence, not for Unicode, even after
explicit conversion:

  <7 u:'í.oê¸?â"?'
+---------+
|í.oê¸?â"?|
+---------+

  datatype 7 u:'í.oê¸?â"?'
unicode
  #7 u:'í.oê¸?â"?'
3


--- June Kim <[EMAIL PROTECTED]> wrote:

I'm working on the code.

In the mean time, here is the code for calculating display width:

First you need to save the text file at
http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt

===============================================
require 'regex jfiles'
t=: 1!:1 <'EastAsianWidth.txt'
point=:'^([0-9A-F]{4});(Na|N|H|A|W|F)' rxmatches t
range=:'([0-9A-F]{1,4})\.\.([0-9A-F]{1,4});(Na|N|H|A|W|F)' rxmatches t
jcreate 'unidatapoint'
(< }."1 point rxfrom t) jappend 'unidatapoint'
jcreate 'unidatarange'
(< }."1 range rxfrom t) jappend 'unidatarange'
===============================================

Now you have unidatapoint.ijf and unidatarange.ijf and are able to use them.

===============================================
require 'jfiles'

NB. N  : half
NB. Na : half
NB. H  : half
NB. A  : half
NB. F  : full
NB. W  : full

widthcode=:;: 'N Na H A F W'
pod=:>jread 'unidatapoint';0
rad=:>jread 'unidatarange';0

towc=: widthcode&i. NB. towidthcode

dfh=. 16&#. @ ('0123456789ABCDEF'&i.)
po=:(dfh each {."1 pod),. <"0 towc"0 {:"1 pod
ra=:(,&.>/"1 dfh each 2&{."1 rad),. <"0 towc"0 {:"1 rad
poa=:>{."1 po

fill=: 4 : 0
'r c'=.x
r=. ({.r)+ i. >: -~/ r
({.c) r}y
)

tab=:65536$0 NB. missing is N
tab=:(> {:"1 po) poa} tab
tab=:>./ ra fill"1 tab

diswid=: [: >: [: 4&<: [: {&tab 3&u:@ucp  NB.for rank 1
================================================
For performance improvement, you could save tab using jfile and use
it. Also, you could use more compact representation(using 3 bits to
represent each character and compress the data).

Usage Example:
   diswid 'í.oê¸?ab!â"?'
2 2 1 1 1 1
(,:~ ((ucp'-') $~ +/@diswid)) ucp 'í.oê¸?ab!-' NB. properly showing
the top line in fixed-pitch font
--------
í.oê¸?ab!-



2007/2/13, Eric Iverson <[EMAIL PROTECTED]>:
> The problem of proper display of boxed unicode data is an > interesting > one. The first step to getting this fixed is for someone to provide > a > working J model that takes an arbitrary boxed argument and produces > the > character stream that properly displays it. If we had such a model > we
> might consider incorporating it into the JE.
>
> ----- Original Message -----
> From: "June Kim" <[EMAIL PROTECTED]>
> To: "General forum" <[email protected]>
> Sent: Sunday, February 11, 2007 5:11 AM
> Subject: Re: [Jgeneral] wd 'set ...' with box draw characters
>
>
> > 2007/2/11, Chris Burke <[EMAIL PROTECTED]>:
> >> June Kim wrote:
> > [snip]
> >> > Second, the box is broken with different width characters(that > >> > is,
> >> > when the length of bytes of the encoding, and the width of the
> >> > characters on display don't match). What is the usual way of
> >> > solving
> >> > it in other programming languages? There is a unicode standard > >> > for
> >> > character widths. http://unicode.org/reports/tr11/
> >> >
> >> > Python implements that standard(along with others) in > >> > unicodedata
> >> > module.
> >> >
> >> >>>> unicodedata.east_asian_width(u'í.o')
> >> > 'W'
> >> >>>> unicodedata.east_asian_width(u'a')
> >> > 'Na'
> >> >
> >> > (u specifies the following string is unicode. east_asian_width
> >> > returns
> >> > the width of the character, not only for east asian characters > >> > but
> >> > all
> >> > unicode characters; it's got a narrow name due to its history)
> >> >
> > [snip]
> >>
> >> If you are having problems with display, it is because of the > >> font,
> >> not
> >> because we are not using unicode.
> > [snip]
> >
> > When a string is boxed and the string includes characters that > > have > > different width to the byte lenghts, then the box is broken in J. > > It > > is not because of the font. It is because J makes an assumption > > that
> > every character's width is same with its byte length, which is
> > obviously false in many writting+encoding systems, including east
> > asians. We can definitely say J's box display isn't > > internationalized
> > yet.
> >
> > For example, 54620 (in unicode code point) is a Korean character,
> > which is pronounced as "han". It's width is "Wide"(twice wide as > > latin
> > alphabets)
> >
> >   han=.4 u: 54620
> >   <han
> > +---+
> > |í.o|
> > +---+
> >   <8 u: han
> > +---+
> > |í.o|
> > +---+
> >
> > Since J counts the byte length for determining character's width, > > and
> > the byte length for han is 3 in UTF-8( 3-: #8 u: han ), the box's
> > horizontal character '-'(of which width is "Narrow") is printed > > three
> > times, and on the display the box is broken.




____________________________________________________________________________________
Want to start your own business?
Learn how on Yahoo! Small Business.
http://smallbusiness.yahoo.com/r-index
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to