[Jbeta] U8 and unicode

Don Guinn Wed, 29 May 2013 07:53:11 -0700

The default conversion of literal to unicode (char to wchar) does not work
as one would expect. Simply setting the high-order byte to zero is like
converting integer to floating point by simply copying the bits as is.


J favors representing Unicode as U8, not unicode (wchar). Not a problem.
And normally, if a literal contains characters in the range _128{.{a. they
represent U8 characters. Why not default conversion of literal to unicode
as if the literal might be U8? Make the default of monadic u: be 7&u: when
applied to a literal and when concatenating literal and unicode assume that
the literal is U8.

The monadic default for u: to be 2&u: is not really a problem, but the
concatenation can easily result in errors.

   z=.u:16b2211

   3!:0 z

131072

   3!:0 ":z

2

   z,":z

∑â

   3 u: z,":z

8721 226 136 145

   3 u: z,7 u: ":z

8721 8721


It seems to me that the last statement is what one would expect for
concatenation.


J8 is soon to be released. Although the way J handles U8 and unicode has
been around quite a while, this might be a good time to make change the
defaults, if you agree that the change would be good.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jbeta] U8 and unicode

Reply via email to