Re: Avoid the Yen Sign [Was: Re: new sigil]
On Sun, Oct 23, 2005 at 10:55:34PM +0900, Dan Kogai wrote: : To make the matter worse, there are not just one yen sign in : Unicode. Take a look at this. : : ¥ U+00A5 YEN SIGN : ¥ U+FFE5 FULLWIDTH YEN SIGN : : Tough they look and groks the same to human, computers handle them : differently. This happened when Unicode Consortium decided to make : BMP round-trippable against legacy encodings. They were distinct in : JIS standards, so happened Unicode. : : Maybe we should avoid other symbols like this for sigils -- those not : in ASCII that have 'fullwidth' variations. q($) and q(\) are okay : (or too late) because they are already in ASCII. q(¥) should be : avoided because you can hardly tell the difference from q(¥) in the : display. : : But this will also outlaw the cent sign. I have attached a list of : those affected. As you see, most are with ASCII equivalents but some : are not. We'd have to outlaw A..Z as well. :-) I think a better plan might just be to say that we'll treat any fullwidth character as equivalent to its narrow companion, at least when used as an operator. Canonicalizing identifiers may be another matter though. On the other hand, certain of the double-width characters are likely to be confused with two singles, such as = FF1DFULLWIDTH EQUALS SIGN _ FF3FFULLWIDTH LOW LINE so maybe they should be equivalent to == and __, or outlawed. And one could (un)reasonably argue that ~ FF5EFULLWIDTH TILDE ought to mean ~~ rather than ~. But in general we need to go slow on such decisions. For now just sticking our toe into Latin-1 is enough, as long as we're looking ahead for visual pitfalls. As for the ¥ pitfall, so far we've intentionally been careful to use it only where an operator is expected, whereas \ is legal only where a term is expected. So at least for Perl code, we can translate legacy ¥ to different codepoints. (Whether the Japanese font distinguishes them is another issue, of course. I have a Unicode font on my machine that prints backslash as ¥, which I find slightly irritating, but doubtless will be par for the course in Japan for the foreseeable future. Maybe that's a good reason to allow the doublewith backslash as an alias for normal backslash. Maybe not.) Anyway, I think people will be able to distinguish visually between A ¥ B and ¥X as long as we keep the operator/term distinction. Larry
Re: Avoid the Yen Sign [Was: Re: new sigil]
: On 10/23/05, Autrijus Tang [EMAIL PROTECTED] wrote: : In addition to your handy table, the and french quotes, which are used : quite heavily in Perl 6 for both bracketing and hyper operators, also have : full width equivalents: : : 300A;LEFT DOUBLE ANGLE BRACKET;Ps;0;ON;Y;OPENING DOUBLE ANGLE BRACKET : 300B;RIGHT DOUBLE ANGLE BRACKET;Pe;0;ON;Y;CLOSING DOUBLE ANGLE BRACKET : : Half width: «» : Full width: 《》 I think we actually speculated about that identity in the Apocalypse. : One way to approach it is to make Perl 6 accept both full- and : half-width variants. : : Another way would be to use ASCII fallbacks exclusively in real programs, and : reserve unicode variants for pretty-printing, the same way that PLT Scheme and : Haskell recognizes λ in literatures, but actually write lambda and : \ respectively : in everyday coding. I think we should enable both approaches. Restricting Unicode characters to literature is wrong, but so is forcing Unicode on someone prematurely. On Sun, Oct 23, 2005 at 07:07:33PM -0400, Rob Kinyon wrote: : Isn't this starting to be the question of why we have the Unicode : operators instead of just functions? Would it be possible to have a : function be infix? At which precedence level? Larry
RE: Avoid the Yen Sign [Was: Re: new sigil]
On Tue, 25 Oct 2005, Larry Wall wrote: As for the ¥ pitfall, so far we've intentionally been careful to use it only where an operator is expected, whereas \ is legal only where a term is expected. So at least for Perl code, we can translate legacy ¥ to different codepoints. (Whether the Japanese font distinguishes them is another issue, of course. I have a Unicode font on my machine that prints backslash as ¥, which I find slightly irritating, but doubtless will be par for the course in Japan for the foreseeable future. Maybe that's a good reason to allow the doublewith backslash as an alias for normal backslash. Maybe not.) BTW, the exact same thing happens with the Won sign ₩ on Korean Windows systems; it is also mapped to 0x5c in the default codepage, and paths are displayed with the Won sign instead of the backslash as separators. Just something to keep in mind in case you are tempted to use the Won sign as a sigil or operator in the future. Cheers, -Jan
Re: Avoid the Yen Sign [Was: Re: new sigil]
Jan Dubois skribis 2005-10-25 12:33 (-0700): Just something to keep in mind in case you are tempted to use the Won sign as a sigil or operator in the future. I don't know what stitch() will do, but this will have to be its infix operator :) zip ¥ Y stitch Won w Juerd -- http://convolution.nl/maak_juerd_blij.html http://convolution.nl/make_juerd_happy.html http://convolution.nl/gajigu_juerd_n.html
Re: Avoid the Yen Sign [Was: Re: new sigil]
Dan Kogai wrote: To make the matter worse, there are not just one yen sign in Unicode. Take a look at this. ¥ U+00A5 YEN SIGN ¥ U+FFE5 FULLWIDTH YEN SIGN Tough they look and groks the same to human, computers handle them differently. This happened when Unicode Consortium decided to make BMP round-trippable against legacy encodings. They were distinct in JIS standards, so happened Unicode. In addition to your handy table, the and french quotes, which are used quite heavily in Perl 6 for both bracketing and hyper operators, also have full width equivalents: 300A;LEFT DOUBLE ANGLE BRACKET;Ps;0;ON;Y;OPENING DOUBLE ANGLE BRACKET 300B;RIGHT DOUBLE ANGLE BRACKET;Pe;0;ON;Y;CLOSING DOUBLE ANGLE BRACKET Half width: «» Full width: 《》 There is no way to type out the half-width yen and double angle brackets under MSWin32, under either the traditional or simplified code pages; only full width variants are available. One way to approach it is to make Perl 6 accept both full- and half-width variants. Another way would be to use ASCII fallbacks exclusively in real programs, and reserve unicode variants for pretty-printing, the same way that PLT Scheme and Haskell recognizes λ in literatures, but actually write lambda and \ respectively in everyday coding. TIMTOWTDI. :) Thanks, /Autrijus/
Re: Avoid the Yen Sign [Was: Re: new sigil]
On 10/23/05, Autrijus Tang [EMAIL PROTECTED] wrote: Dan Kogai wrote: To make the matter worse, there are not just one yen sign in Unicode. Take a look at this. ¥ U+00A5 YEN SIGN ¥ U+FFE5 FULLWIDTH YEN SIGN Tough they look and groks the same to human, computers handle them differently. This happened when Unicode Consortium decided to make BMP round-trippable against legacy encodings. They were distinct in JIS standards, so happened Unicode. In addition to your handy table, the and french quotes, which are used quite heavily in Perl 6 for both bracketing and hyper operators, also have full width equivalents: 300A;LEFT DOUBLE ANGLE BRACKET;Ps;0;ON;Y;OPENING DOUBLE ANGLE BRACKET 300B;RIGHT DOUBLE ANGLE BRACKET;Pe;0;ON;Y;CLOSING DOUBLE ANGLE BRACKET Half width: «» Full width: 《》 There is no way to type out the half-width yen and double angle brackets under MSWin32, under either the traditional or simplified code pages; only full width variants are available. One way to approach it is to make Perl 6 accept both full- and half-width variants. Another way would be to use ASCII fallbacks exclusively in real programs, and reserve unicode variants for pretty-printing, the same way that PLT Scheme and Haskell recognizes λ in literatures, but actually write lambda and \ respectively in everyday coding. Isn't this starting to be the question of why we have the Unicode operators instead of just functions? Would it be possible to have a function be infix? Rob