Re: new sigil
Autrijus wrote: Indeed. Somehow I think this makes some sense: sub Bool eqv (|T $x, |T $y) { ... } Except that it prevents anyone from ever writing: multi sub circumfix:| | (Num $x) { return abs $x } multi sub circumfix:| | (Vec $x) { return $x.mag } which many mathematically inclined folks might find annoying. (It also precludes intriguing possibilities like: multi sub circumfix:«| » ($q) { return Quantum::State.new(val = $q) } which I personally would find irritating. ;-) Damian
Re: new sigil
Luke Palmer wrote: limited access to system settings. And in those kinds of corporate environments, you're not going to be working with any code but code written in-house. Which means that nobody is going to be using Latin-1, and everyone will be using the ASCII synonyms. What's the problem? Dave Whipp wrote: My experience is that this isn't true: we use lots of external code, but I still need to file requests with IT to get system-settings changed. Right. We rely on Perl libraries from CPAN, and elsewhere. You have to make sure that the code you are looking at is transfered via utf-8 aware systems only. It is not safe that we decide to use ASCII synonyms ourselves. We have to be sure that all the modules, which happen to have Unicode sigils/ops, should be installed without intervening legacy systems. Explanation of the situation in Japan follows. Those who are not interested in Japan can skip. Seemingly this problem is very unique to Japan. (It's already one year since yen sign became zip-operator. This is not to kick an argument, just a whining of mine. :P) The problem doesn't reside in writing code but in carrying files. - You cannot tell whether a text file is in US-ASCII, utf8, or ShiftJIS, when all the code points are below 0x7f. It is too late when you receive a code snippet from your colleague by mail. - If we convert yen from Latin-1 (0xa5) to Unicode (utf8=c2a5), then to the default coding system, which is believed to be ASCII but actually ShiftJIS, it becomes 0x5c. There's no way to tell whether the byte was a bachslash or a yen at the beginning. Grepping for yen signs doesn't help because at the time you run grep, they are already backslashes. If we find a lot of yen signs as zip-operators in the standard library, Japanese would have a big question: Give up either Perl6 or Windows. Which do we need? And I suppose the answer would be We have a lot of substitutes to Perl6: Ruby, Perl5, etc. In [EMAIL PROTECTED] Larry wrote: (Of course, we'll leave out the little problem that half the people in Japan would read it as a backslash wannabe...that's not really a problem since a zipper would only be used where an operator is expected, and backslash is illegal there (so far).) It is not the people who read a yen as a backslash, but the legacy systems. We might define backslash as a synonym for the zip op, but it's too risky. Yen as zip has the same magnitude of risk in Japan. -- Kaoru Maeda [EMAIL PROTECTED]
Re: new sigil
Luke Palmer wrote: limited access to system settings. And in those kinds of corporate environments, you're not going to be working with any code but code written in-house. Which means that nobody is going to be using Latin-1, and everyone will be using the ASCII synonyms. What's the problem? Dave Whipp wrote: My experience is that this isn't true: we use lots of external code, but I still need to file requests with IT to get system-settings changed. Right. We rely on Perl libraries from CPAN, and elsewhere. You have to make sure that the code you are looking at is transfered via utf-8 aware systems only. It is not safe that we decide to use ASCII synonyms ourselves. We have to be sure that all the modules, which happen to have Unicode sigils/ops, should be installed without intervening legacy systems. Explanation of the situation in Japan follows. Those who are not interested in Japan can skip. Seemingly this problem is very unique to Japan. It's already one year since yen sign became zip-operator. This is not to kick a discussion, just a whining of mine. :P Ancient ISO-646 allowed variants, which substitute certain part of ASCII characters with local symbols. Currency signs were the first candidates of this. http://en.wikipedia.org/wiki/ISO_646 This legacy convention is still alive in Japan as JIS/ShiftJIS encodings. I hope Unicode supercedes them and the backslash-yen confusion would disappear, but the movement is not quick enough. The problem doesn't reside in writing code but in carrying files. - You cannot tell whether a text file is in US-ASCII, utf8, or ShiftJIS, when all the code points are below 0x7f. It is too late when you receive a code snippet from your colleague by mail. - If we convert yen from Latin-1 (0xa5) to Unicode (utf8=c2a5), then to the default coding system, which is believed to be ASCII but actually ShiftJIS, it becomes 0x5c. There's no way to tell whether the byte was a bachslash or a yen at the beginning. Grepping for yen signs doesn't help because at the time you run grep, they are already backslashes. If we find a lot of yen sign as zip-operator in the standard library, we have a big question: Give up either Perl6 or Windows. Which do we abandon? And I suppose the answer would be We have a lot of substitutes to Perl6: Ruby, Perl5, etc. In Japan, yes is synonym to backslash. We wish to retain this legacy. Zip-operator is far less important than regex-escape, string-escape, and take-reference operator. -- Kaoru Maeda [EMAIL PROTECTED]
Avoid the Yen Sign [Was: Re: new sigil]
Maeda-san and the list members, Thank you for raising this issue and sorry for not raising this myself. On Oct 22, 2005, at 19:42 , Kaoru Maeda wrote: If we find a lot of yen sign as zip-operator in the standard library, we have a big question: Give up either Perl6 or Windows. Which do we abandon? And I suppose the answer would be We have a lot of substitutes to Perl6: Ruby, Perl5, etc. In Japan, yes is synonym to backslash. We wish to retain this legacy. Zip-operator is far less important than regex-escape, string- escape, and take-reference operator. To make the matter worse, there are not just one yen sign in Unicode. Take a look at this. ¥ U+00A5 YEN SIGN ¥ U+FFE5 FULLWIDTH YEN SIGN Tough they look and groks the same to human, computers handle them differently. This happened when Unicode Consortium decided to make BMP round-trippable against legacy encodings. They were distinct in JIS standards, so happened Unicode. Maybe we should avoid other symbols like this for sigils -- those not in ASCII that have 'fullwidth' variations. q($) and q(\) are okay (or too late) because they are already in ASCII. q(¥) should be avoided because you can hardly tell the difference from q(¥) in the display. But this will also outlaw the cent sign. I have attached a list of those affected. As you see, most are with ASCII equivalents but some are not. Dan the Man with Too Many Signs to Deal With % grep FULLWIDTH /usr/local/lib/perl5/5.8.7/unicore/Name.pl | perl - Mencoding=utf8 -aple '$_=chr(hex($F[0])).\t.$_' ! FF01FULLWIDTH EXCLAMATION MARK " FF02FULLWIDTH QUOTATION MARK # FF03FULLWIDTH NUMBER SIGN $ FF04FULLWIDTH DOLLAR SIGN % FF05FULLWIDTH PERCENT SIGN & FF06FULLWIDTH AMPERSAND ' FF07FULLWIDTH APOSTROPHE ( FF08FULLWIDTH LEFT PARENTHESIS ) FF09FULLWIDTH RIGHT PARENTHESIS * FF0AFULLWIDTH ASTERISK + FF0BFULLWIDTH PLUS SIGN , FF0CFULLWIDTH COMMA - FF0DFULLWIDTH HYPHEN-MINUS . FF0EFULLWIDTH FULL STOP / FF0FFULLWIDTH SOLIDUS 0 FF10FULLWIDTH DIGIT ZERO 1 FF11FULLWIDTH DIGIT ONE 2 FF12FULLWIDTH DIGIT TWO 3 FF13FULLWIDTH DIGIT THREE 4 FF14FULLWIDTH DIGIT FOUR 5 FF15FULLWIDTH DIGIT FIVE 6 FF16FULLWIDTH DIGIT SIX 7 FF17FULLWIDTH DIGIT SEVEN 8 FF18FULLWIDTH DIGIT EIGHT 9 FF19FULLWIDTH DIGIT NINE : FF1AFULLWIDTH COLON ; FF1BFULLWIDTH SEMICOLON < FF1CFULLWIDTH LESS-THAN SIGN = FF1DFULLWIDTH EQUALS SIGN > FF1EFULLWIDTH GREATER-THAN SIGN ? FF1FFULLWIDTH QUESTION MARK @ FF20FULLWIDTH COMMERCIAL AT A FF21FULLWIDTH LATIN CAPITAL LETTER A B FF22FULLWIDTH LATIN CAPITAL LETTER B C FF23FULLWIDTH LATIN CAPITAL LETTER C D FF24FULLWIDTH LATIN CAPITAL LETTER D E FF25FULLWIDTH LATIN CAPITAL LETTER E F FF26FULLWIDTH LATIN CAPITAL LETTER F G FF27FULLWIDTH LATIN CAPITAL LETTER G H FF28FULLWIDTH LATIN CAPITAL LETTER H I FF29FULLWIDTH LATIN CAPITAL LETTER I J FF2AFULLWIDTH LATIN CAPITAL LETTER J K FF2BFULLWIDTH LATIN CAPITAL LETTER K L FF2CFULLWIDTH LATIN CAPITAL LETTER L M FF2DFULLWIDTH LATIN CAPITAL LETTER M N FF2EFULLWIDTH LATIN CAPITAL LETTER N O FF2FFULLWIDTH LATIN CAPITAL LETTER O P FF30FULLWIDTH LATIN CAPITAL LETTER P Q FF31FULLWIDTH LATIN CAPITAL LETTER Q R FF32FULLWIDTH LATIN CAPITAL LETTER R S FF33FULLWIDTH LATIN CAPITAL LETTER S T FF34FULLWIDTH LATIN CAPITAL LETTER T U FF35FULLWIDTH LATIN CAPITAL LETTER U V FF36FULLWIDTH LATIN CAPITAL LETTER V W FF37FULLWIDTH LATIN CAPITAL LETTER W X FF38FULLWIDTH LATIN CAPITAL LETTER X Y FF39FULLWIDTH LATIN CAPITAL LETTER Y Z FF3AFULLWIDTH LATIN CAPITAL LETTER Z [ FF3BFULLWIDTH LEFT SQUARE BRACKET \ FF3CFULLWIDTH REVERSE SOLIDUS ] FF3DFULLWIDTH RIGHT SQUARE BRACKET ^ FF3EFULLWIDTH CIRCUMFLEX ACCENT _ FF3FFULLWIDTH LOW LINE ` FF40FULLWIDTH GRAVE ACCENT a FF41FULLWIDTH LATIN SMALL LETTER A b FF42FULLWIDTH LATIN SMALL LETTER B c FF43FULLWIDTH LATIN SMALL LETTER C d
Re: Avoid the Yen Sign [Was: Re: new sigil]
Dan Kogai wrote: To make the matter worse, there are not just one yen sign in Unicode. Take a look at this. ¥ U+00A5 YEN SIGN ¥ U+FFE5 FULLWIDTH YEN SIGN Tough they look and groks the same to human, computers handle them differently. This happened when Unicode Consortium decided to make BMP round-trippable against legacy encodings. They were distinct in JIS standards, so happened Unicode. In addition to your handy table, the and french quotes, which are used quite heavily in Perl 6 for both bracketing and hyper operators, also have full width equivalents: 300A;LEFT DOUBLE ANGLE BRACKET;Ps;0;ON;Y;OPENING DOUBLE ANGLE BRACKET 300B;RIGHT DOUBLE ANGLE BRACKET;Pe;0;ON;Y;CLOSING DOUBLE ANGLE BRACKET Half width: «» Full width: 《》 There is no way to type out the half-width yen and double angle brackets under MSWin32, under either the traditional or simplified code pages; only full width variants are available. One way to approach it is to make Perl 6 accept both full- and half-width variants. Another way would be to use ASCII fallbacks exclusively in real programs, and reserve unicode variants for pretty-printing, the same way that PLT Scheme and Haskell recognizes λ in literatures, but actually write lambda and \ respectively in everyday coding. TIMTOWTDI. :) Thanks, /Autrijus/
Re: Avoid the Yen Sign [Was: Re: new sigil]
On 10/23/05, Autrijus Tang [EMAIL PROTECTED] wrote: Dan Kogai wrote: To make the matter worse, there are not just one yen sign in Unicode. Take a look at this. ¥ U+00A5 YEN SIGN ¥ U+FFE5 FULLWIDTH YEN SIGN Tough they look and groks the same to human, computers handle them differently. This happened when Unicode Consortium decided to make BMP round-trippable against legacy encodings. They were distinct in JIS standards, so happened Unicode. In addition to your handy table, the and french quotes, which are used quite heavily in Perl 6 for both bracketing and hyper operators, also have full width equivalents: 300A;LEFT DOUBLE ANGLE BRACKET;Ps;0;ON;Y;OPENING DOUBLE ANGLE BRACKET 300B;RIGHT DOUBLE ANGLE BRACKET;Pe;0;ON;Y;CLOSING DOUBLE ANGLE BRACKET Half width: «» Full width: 《》 There is no way to type out the half-width yen and double angle brackets under MSWin32, under either the traditional or simplified code pages; only full width variants are available. One way to approach it is to make Perl 6 accept both full- and half-width variants. Another way would be to use ASCII fallbacks exclusively in real programs, and reserve unicode variants for pretty-printing, the same way that PLT Scheme and Haskell recognizes λ in literatures, but actually write lambda and \ respectively in everyday coding. Isn't this starting to be the question of why we have the Unicode operators instead of just functions? Would it be possible to have a function be infix? Rob