utf8 is not involved in (1!:2).  (1!:2) write out the internal
representation data stream which is ucs2, ie always 2 bytes per
character. 

However (1!:1) cannot know in which format the text should be
interpreted therefore only a simple byte stream is returned. And
since j6/j7/j8 frontend assume utf8 so that the stream was
displayed as utf8.

There are package convert/jconv and convert/misc which contain
utility for code page conversion.

Пн, 25 мар 2013, Don Guinn писал(а):
> Looks like fwrite (1!:2) has been modified after J6 to convert unicode
> (DBCS) text to UTF-8. It doesn't do it quite right. If it encounters a
> character that needs to be converted to UTF-8 it does so properly; however,
> ASCII characters (128{.a) are padded to two characters with a zero byte.
> The ASCII characters should be written out without padding. Or the
> non-ASCII characters should be written out as-is like in J6.
> 
> This makes a confusing mess as fread does not convert the UTF-8 characters
> automatically. And it shouldn't as it should be able to read any file type
> where bytes may look like UTF-8 but are not.
> 
> fwrite should not attempt to convert unicode to UTF-8 as it writes as one
> may really want to create a DBCS file. unicode text can still be written
> out as UTF-8 if the user so chooses by simply applying 8&u: before writing.
> 
> If it is felt that people should be able to automatically convert between
> unicode and UTF-8 when reading and writing files then there should be new
> read and write options added to the file conjunction leaving the old ones
> alone.
> 
> This fails in J8 64 bit and J7 64 bit under Windows 7. Have not tried 32
> bit.
> 
>    JVERSION
> 
> Engine: j701/2011-01-10/11:25
> 
> Library: 8.01.008
> 
> Qt IDE: 1.0.3
> 
> Platform: Win 64
> 
> Installer: j801 beta install
> 
> InstallPath: c:/j/j64-801a
> 
>    ]l=:(u:16b2211),' 1 2 3' NB. 16b2211 is Unicode sigma.
> 
> ∑ 1 2 3
> 
> $l
> 
> 7
> 
> l fwrite 'test.txt'
> 
> 7
> 
> fread 'test.txt'
> 
> ┬" 1 2 3
> 
> 3 u: fread 'test.txt'
> 
> 17 34 32 0 49 0 32 0 50 0 32 0 51 0
> 
> 3 u: l
> 
> 8721 32 49 32 50 32 51
> 
> 3 u: 8 u: l
> 
> 226 136 145 32 49 32 50 32 51
> 
> 
> By the way. I copied and pasted the above from the term window where all
> input lines were indented 3 spaces. For some reason the indention is lost
> in the paste after the first line.
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

-- 
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to