Re: [Jbeta] Definitie UTF-8 to Unicde conversion for u:

Oleg Kobchenko Tue, 06 Feb 2007 23:02:40 -0800

--- Chris Burke <[EMAIL PROTECTED]> wrote:

> Oleg Kobchenko wrote:
> > Despite the multitude of function selectors in u: verb,
> > it's not clear how it is possible to convert from UTF-8
> > to wchar (Unicode) in one step.
> > 
> > 7 u: coverts to EITHER char OR wchar.
> > 
> > But it seems that a fairly common use is to supply wchar (Unicode)
> > argument to DLL call or to obtain a binary form of wchar
> > (which per se is not clear how to get: byte array of wchars),
> > where the argument is UTF-8, but could be ASCII, like word "Test".
> > 
> > But for word "Test", 7 u: does not work:
> > 
> >    datatype 7 u: 'Test'   NB. Western
> > literal
> >    datatype 7 u: 'Òåñò'   NB. Cyrillic
> > unicode
> > 
> > So the workaround is 4 u: 3 u: 7 u: (three verbs)
> > 
> >    datatype 4 u: 3 u: 7 u: 'Test'
> > unicode
> > 
> >    4 u: 3 u: 7 u: 'Òåñò'
> > Òåñò
> > 
> > Is this EITHER / OR in 7 u: really needed?
> > 
> > Why not just always yield wchar?
> 
> It is useful because the result of 7 u: is in its simplest form, i.e.
> the conversion to 2 byte unicode takes place only if necessary. Bill's
> solution is recommended.


Using monadic u: is dummy appending of zeros to each char,
which does not work with UTF-8.

As a result we have an interface that is too smart
and makes complex decisions in its simplest form,
and for a simple direct transformation it requires
a complex application of 3 verbs.

The default behavior of u: is unpredictable,
depending on incoming data, it can return two
possible datatypes.

It's OK to possibly to use the workaround, but
it's not easily evident where such flexibility
would be useful?



 
____________________________________________________________________________________
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
http://autos.yahoo.com/new_cars.html 
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jbeta] Definitie UTF-8 to Unicde conversion for u:

Reply via email to