Re: new sigil

2005-10-23 Thread Damian Conway

Autrijus wrote:


Indeed.  Somehow I think this makes some sense:

sub Bool eqv (|T $x, |T $y) { ... }


Except that it prevents anyone from ever writing:

multi sub circumfix:| | (Num $x) { return abs $x }
multi sub circumfix:| | (Vec $x) { return $x.mag }

which many mathematically inclined folks might find annoying.

(It also precludes intriguing possibilities like:

multi sub circumfix:«| » ($q) { return Quantum::State.new(val = $q) }

which I personally would find irritating. ;-)

Damian


Re: new sigil

2005-10-23 Thread maeda
Luke Palmer wrote:
 limited access to system settings.
 And in those kinds of corporate environments, you're not going to be
 working with any code but code written in-house.  Which means that
 nobody is going to be using Latin-1, and everyone will be using the
 ASCII synonyms.  What's the problem?

Dave Whipp wrote:
 My experience is that this isn't true: we use lots of external code,
 but I still need to file requests with IT to get system-settings changed.

Right.  We rely on Perl libraries from CPAN, and elsewhere.  You
have to make sure that the code you are looking at is transfered
via utf-8 aware systems only.  It is not safe that we decide to
use ASCII synonyms ourselves.  We have to be sure that all the
modules, which happen to have Unicode sigils/ops, should be
installed without intervening legacy systems.

Explanation of the situation in Japan follows.  Those who are not
interested in Japan can skip.  Seemingly this problem is very unique
to Japan.

(It's already one year since yen sign became zip-operator.
This is not to kick an argument, just a whining of mine. :P)

The problem doesn't reside in writing code but in carrying files.
   - You cannot tell whether a text file is in US-ASCII, utf8,
 or ShiftJIS, when all the code points are below 0x7f.  It
 is too late when you receive a code snippet from your
 colleague by mail.
   - If we convert yen from Latin-1 (0xa5) to Unicode
 (utf8=c2a5), then to the default coding system, which is
 believed to be ASCII but actually ShiftJIS, it becomes
 0x5c.  There's no way to tell whether the byte was a
 bachslash or a yen at the beginning.

Grepping for yen signs doesn't help because at the time you run
grep, they are already backslashes.

If we find a lot of yen signs as zip-operators in the standard
library, Japanese would have a big question: Give up either
Perl6 or Windows.  Which do we need?  And I suppose the answer
would be We have a lot of substitutes to Perl6: Ruby, Perl5,
etc.

In [EMAIL PROTECTED] Larry wrote:
 (Of course, we'll leave out the little problem that half the people
 in Japan would read it as a backslash wannabe...that's not really
 a problem since a zipper would only be used where an operator is
 expected, and backslash is illegal there (so far).)

It is not the people who read a yen as a backslash, but the
legacy systems.  We might define backslash as a synonym for the
zip op, but it's too risky.  Yen as zip has the same magnitude
of risk in Japan.

-- 
Kaoru Maeda
[EMAIL PROTECTED]


Re: new sigil

2005-10-23 Thread Kaoru Maeda

 Luke Palmer wrote:

 limited access to system settings.
 And in those kinds of corporate environments, you're not going to be
 working with any code but code written in-house.  Which means that
 nobody is going to be using Latin-1, and everyone will be using the
 ASCII synonyms.  What's the problem?

Dave Whipp wrote:
 My experience is that this isn't true: we use lots of external code,
 but I still need to file requests with IT to get system-settings changed.

Right.  We rely on Perl libraries from CPAN, and elsewhere.
You have to make sure that the code you are looking at is
transfered via utf-8 aware systems only.
It is not safe that we decide to use ASCII synonyms ourselves.
We have to be sure that all the modules, which happen to
have Unicode sigils/ops, should be installed without intervening
legacy systems.

Explanation of the situation in Japan follows.  Those who are not
interested in Japan can skip.  Seemingly this problem is very unique
to Japan.  It's already one year since yen sign became zip-operator.
This is not to kick a discussion, just a whining of mine. :P

Ancient ISO-646 allowed variants, which substitute certain part of ASCII 
characters
with local symbols.  Currency signs were the first candidates of this.
http://en.wikipedia.org/wiki/ISO_646
This legacy convention is still alive in Japan as JIS/ShiftJIS encodings.
I hope Unicode supercedes them and the backslash-yen confusion would 
disappear,
but the movement is not quick enough.

The problem doesn't reside in writing code but in carrying files.
  - You cannot tell whether a text file is in US-ASCII, utf8,
or ShiftJIS, when all the code points are below 0x7f.  It is too
late when you receive a code snippet from your colleague by mail.
  - If we convert yen from Latin-1 (0xa5) to Unicode
(utf8=c2a5), then to the default coding system,
which is believed to be ASCII but actually
ShiftJIS, it becomes 0x5c.  There's no way to tell
whether the byte was a bachslash or a yen at the beginning.

Grepping for yen signs doesn't help because at the time you
run grep, they are already backslashes.

If we find a lot of yen sign as zip-operator in the standard library,
we have a big question: Give up either Perl6 or Windows.  Which do we abandon?
And I suppose the answer would be We have a lot of substitutes to Perl6:
Ruby, Perl5, etc.

In Japan, yes is synonym to backslash.  We wish to retain this legacy.
Zip-operator is far less important than regex-escape, string-escape, and
take-reference operator.

--
Kaoru Maeda
[EMAIL PROTECTED]


Avoid the Yen Sign [Was: Re: new sigil]

2005-10-23 Thread Dan Kogai

Maeda-san and the list members,

Thank you for raising this issue and sorry for not raising this myself.

On Oct 22, 2005, at 19:42 , Kaoru Maeda wrote:

If we find a lot of yen sign as zip-operator in the standard library,
we have a big question: Give up either Perl6 or Windows.  Which do  
we abandon?
And I suppose the answer would be We have a lot of substitutes to  
Perl6:

Ruby, Perl5, etc.

In Japan, yes is synonym to backslash.  We wish to retain this legacy.
Zip-operator is far less important than regex-escape, string- 
escape, and

take-reference operator.


To make the matter worse, there are not just one yen sign in  
Unicode. Take a look at this.


¥ U+00A5 YEN SIGN
¥ U+FFE5 FULLWIDTH YEN SIGN

Tough they look and groks the same to human, computers handle them  
differently.  This happened when Unicode Consortium decided to make  
BMP round-trippable against legacy encodings.  They were distinct in  
JIS standards, so happened Unicode.


Maybe we should avoid other symbols like this for sigils -- those not  
in ASCII that have 'fullwidth' variations.  q($) and q(\) are okay  
(or too late) because they are already in ASCII.  q(¥) should be  
avoided because you can hardly tell the difference from q(¥) in the  
display.


But this will also outlaw the cent sign.  I have attached a list of  
those affected.  As you see, most are with ASCII equivalents but some  
are not.


Dan the Man with Too Many Signs to Deal With

% grep FULLWIDTH /usr/local/lib/perl5/5.8.7/unicore/Name.pl | perl - 
Mencoding=utf8 -aple '$_=chr(hex($F[0])).\t.$_'

!   FF01FULLWIDTH EXCLAMATION MARK
"   FF02FULLWIDTH QUOTATION MARK
#   FF03FULLWIDTH NUMBER SIGN
$   FF04FULLWIDTH DOLLAR SIGN
%   FF05FULLWIDTH PERCENT SIGN
&   FF06FULLWIDTH AMPERSAND
'   FF07FULLWIDTH APOSTROPHE
(   FF08FULLWIDTH LEFT PARENTHESIS
)   FF09FULLWIDTH RIGHT PARENTHESIS
*   FF0AFULLWIDTH ASTERISK
+   FF0BFULLWIDTH PLUS SIGN
,   FF0CFULLWIDTH COMMA
-   FF0DFULLWIDTH HYPHEN-MINUS
.   FF0EFULLWIDTH FULL STOP
/   FF0FFULLWIDTH SOLIDUS
0   FF10FULLWIDTH DIGIT ZERO
1   FF11FULLWIDTH DIGIT ONE
2   FF12FULLWIDTH DIGIT TWO
3   FF13FULLWIDTH DIGIT THREE
4   FF14FULLWIDTH DIGIT FOUR
5   FF15FULLWIDTH DIGIT FIVE
6   FF16FULLWIDTH DIGIT SIX
7   FF17FULLWIDTH DIGIT SEVEN
8   FF18FULLWIDTH DIGIT EIGHT
9   FF19FULLWIDTH DIGIT NINE
:   FF1AFULLWIDTH COLON
;   FF1BFULLWIDTH SEMICOLON
<   FF1CFULLWIDTH LESS-THAN SIGN
=   FF1DFULLWIDTH EQUALS SIGN
>   FF1EFULLWIDTH GREATER-THAN SIGN
?   FF1FFULLWIDTH QUESTION MARK
@   FF20FULLWIDTH COMMERCIAL AT
A   FF21FULLWIDTH LATIN CAPITAL LETTER A
B   FF22FULLWIDTH LATIN CAPITAL LETTER B
C   FF23FULLWIDTH LATIN CAPITAL LETTER C
D   FF24FULLWIDTH LATIN CAPITAL LETTER D
E   FF25FULLWIDTH LATIN CAPITAL LETTER E
F   FF26FULLWIDTH LATIN CAPITAL LETTER F
G   FF27FULLWIDTH LATIN CAPITAL LETTER G
H   FF28FULLWIDTH LATIN CAPITAL LETTER H
I   FF29FULLWIDTH LATIN CAPITAL LETTER I
J   FF2AFULLWIDTH LATIN CAPITAL LETTER J
K   FF2BFULLWIDTH LATIN CAPITAL LETTER K
L   FF2CFULLWIDTH LATIN CAPITAL LETTER L
M   FF2DFULLWIDTH LATIN CAPITAL LETTER M
N   FF2EFULLWIDTH LATIN CAPITAL LETTER N
O   FF2FFULLWIDTH LATIN CAPITAL LETTER O
P   FF30FULLWIDTH LATIN CAPITAL LETTER P
Q   FF31FULLWIDTH LATIN CAPITAL LETTER Q
R   FF32FULLWIDTH LATIN CAPITAL LETTER R
S   FF33FULLWIDTH LATIN CAPITAL LETTER S
T   FF34FULLWIDTH LATIN CAPITAL LETTER T
U   FF35FULLWIDTH LATIN CAPITAL LETTER U
V   FF36FULLWIDTH LATIN CAPITAL LETTER V
W   FF37FULLWIDTH LATIN CAPITAL LETTER W
X   FF38FULLWIDTH LATIN CAPITAL LETTER X
Y   FF39FULLWIDTH LATIN CAPITAL LETTER Y
Z   FF3AFULLWIDTH LATIN CAPITAL LETTER Z
[   FF3BFULLWIDTH LEFT SQUARE BRACKET
\   FF3CFULLWIDTH REVERSE SOLIDUS
]   FF3DFULLWIDTH RIGHT SQUARE BRACKET
^   FF3EFULLWIDTH CIRCUMFLEX ACCENT
_   FF3FFULLWIDTH LOW LINE
`   FF40FULLWIDTH GRAVE ACCENT
a   FF41FULLWIDTH LATIN SMALL LETTER A
b   FF42FULLWIDTH LATIN SMALL LETTER B
c   FF43FULLWIDTH LATIN SMALL LETTER C
d   

Re: Avoid the Yen Sign [Was: Re: new sigil]

2005-10-23 Thread Autrijus Tang
Dan Kogai wrote:
 To make the matter worse, there are not just one yen sign in  Unicode.
 Take a look at this.

 ¥ U+00A5 YEN SIGN
 ¥ U+FFE5 FULLWIDTH YEN SIGN

 Tough they look and groks the same to human, computers handle them
 differently.  This happened when Unicode Consortium decided to make  BMP
 round-trippable against legacy encodings.  They were distinct in  JIS
 standards, so happened Unicode.

In addition to your handy table, the  and  french quotes, which are used
quite heavily in Perl 6 for both bracketing and hyper operators, also have
full width equivalents:

300A;LEFT DOUBLE ANGLE BRACKET;Ps;0;ON;Y;OPENING DOUBLE ANGLE BRACKET
300B;RIGHT DOUBLE ANGLE BRACKET;Pe;0;ON;Y;CLOSING DOUBLE ANGLE BRACKET

Half width: «»
Full width: 《》

There is no way to type out the half-width yen and double angle brackets under
MSWin32, under either the traditional or simplified code pages; only full width
variants are available.

One way to approach it is to make Perl 6 accept both full- and
half-width variants.

Another way would be to use ASCII fallbacks exclusively in real programs, and
reserve unicode variants for pretty-printing, the same way that PLT Scheme and
Haskell recognizes λ in literatures, but actually write lambda and
\ respectively
in everyday coding.

TIMTOWTDI. :)

Thanks,
/Autrijus/


Re: Avoid the Yen Sign [Was: Re: new sigil]

2005-10-23 Thread Rob Kinyon
On 10/23/05, Autrijus Tang [EMAIL PROTECTED] wrote:
 Dan Kogai wrote:
  To make the matter worse, there are not just one yen sign in  Unicode.
  Take a look at this.
 
  ¥ U+00A5 YEN SIGN
  ¥ U+FFE5 FULLWIDTH YEN SIGN
 
  Tough they look and groks the same to human, computers handle them
  differently.  This happened when Unicode Consortium decided to make  BMP
  round-trippable against legacy encodings.  They were distinct in  JIS
  standards, so happened Unicode.

 In addition to your handy table, the  and  french quotes, which are used
 quite heavily in Perl 6 for both bracketing and hyper operators, also have
 full width equivalents:

 300A;LEFT DOUBLE ANGLE BRACKET;Ps;0;ON;Y;OPENING DOUBLE ANGLE BRACKET
 300B;RIGHT DOUBLE ANGLE BRACKET;Pe;0;ON;Y;CLOSING DOUBLE ANGLE BRACKET

 Half width: «»
 Full width: 《》

 There is no way to type out the half-width yen and double angle brackets under
 MSWin32, under either the traditional or simplified code pages; only full 
 width
 variants are available.

 One way to approach it is to make Perl 6 accept both full- and
 half-width variants.

 Another way would be to use ASCII fallbacks exclusively in real programs, and
 reserve unicode variants for pretty-printing, the same way that PLT Scheme and
 Haskell recognizes λ in literatures, but actually write lambda and
 \ respectively
 in everyday coding.

Isn't this starting to be the question of why we have the Unicode
operators instead of just functions? Would it be possible to have a
function be infix?

Rob