utf8 is not involved in (1!:2). (1!:2) write out the internal
representation data stream which is ucs2, ie always 2 bytes per
character.
However (1!:1) cannot know in which format the text should be
interpreted therefore only a simple byte stream is returned. And
since j6/j7/j8 frontend assume utf8 so that the stream was
displayed as utf8.
There are package convert/jconv and convert/misc which contain
utility for code page conversion.
Пн, 25 мар 2013, Don Guinn писал(а):
> Looks like fwrite (1!:2) has been modified after J6 to convert unicode
> (DBCS) text to UTF-8. It doesn't do it quite right. If it encounters a
> character that needs to be converted to UTF-8 it does so properly; however,
> ASCII characters (128{.a) are padded to two characters with a zero byte.
> The ASCII characters should be written out without padding. Or the
> non-ASCII characters should be written out as-is like in J6.
>
> This makes a confusing mess as fread does not convert the UTF-8 characters
> automatically. And it shouldn't as it should be able to read any file type
> where bytes may look like UTF-8 but are not.
>
> fwrite should not attempt to convert unicode to UTF-8 as it writes as one
> may really want to create a DBCS file. unicode text can still be written
> out as UTF-8 if the user so chooses by simply applying 8&u: before writing.
>
> If it is felt that people should be able to automatically convert between
> unicode and UTF-8 when reading and writing files then there should be new
> read and write options added to the file conjunction leaving the old ones
> alone.
>
> This fails in J8 64 bit and J7 64 bit under Windows 7. Have not tried 32
> bit.
>
> JVERSION
>
> Engine: j701/2011-01-10/11:25
>
> Library: 8.01.008
>
> Qt IDE: 1.0.3
>
> Platform: Win 64
>
> Installer: j801 beta install
>
> InstallPath: c:/j/j64-801a
>
> ]l=:(u:16b2211),' 1 2 3' NB. 16b2211 is Unicode sigma.
>
> ∑ 1 2 3
>
> $l
>
> 7
>
> l fwrite 'test.txt'
>
> 7
>
> fread 'test.txt'
>
> ┬" 1 2 3
>
> 3 u: fread 'test.txt'
>
> 17 34 32 0 49 0 32 0 50 0 32 0 51 0
>
> 3 u: l
>
> 8721 32 49 32 50 32 51
>
> 3 u: 8 u: l
>
> 226 136 145 32 49 32 50 32 51
>
>
> By the way. I copied and pasted the above from the term window where all
> input lines were indented 3 spaces. For some reason the indention is lost
> in the paste after the first line.
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
--
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm