Raul wrote:
> Rather than give you a screenshot, here's what inspired my original message:
>
> https://twitter.com/0xabad1dea/status/509748597668446208
When I want to embed Unicode reliably in my J programs, I typically spell
out the codepoints as numeric constants, using "16bXXXX" in place of
"U+XXXX" (bearing in mind that hexadecimal values in J's constant notation
must be in lowercase; that is, 16b01ab, not 16b01AB).
Then, I play around with variations on ucp, utf8, 3&u:, 4&u: until I get
the results I'm expecting (at the very least, in terms of the length of
the string, which may not be the number of distinct characters a human
would identify, but should at least be [significantly] lower than the
number of UTF8 codepoints).
Using your example, I might do something along the lines below.
-Dan
require 'printf ~system\extras\util\browser.ijs'
UC =: ucp 4 u: 0 ". LF&=`(,:& ' ')} noun define -. TAB
16b0075 16b035e 16b0319 16b031d 16b0359 16b0317
16b0317 16b0356 16b006e 16b0069 16b033b 16b0063
16b0338 16b0318 16b033c 16b032d 16b0356 16b032c
16b0332 16b006f 16b035e 16b0064 16b0328 16b032e
16b033a 16b032a 16b0329 16b0356 16b0065 16b0358
16b032b 16b0323 16b0020
)
HTML =: noun define -. TAB,CR
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Unicode test page</title>
</head>
<body>
%s
</body>
</html>
)
FN =: jpath '~temp\unicode.html'
FN fwrite~ HTML sprintf < utf8 UC
launch_jbrowser_ F
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm