Re: Avoid the Yen Sign [Was: Re: new sigil]

2005-10-25 Thread Larry Wall
On Sun, Oct 23, 2005 at 10:55:34PM +0900, Dan Kogai wrote:
: To make the matter worse, there are not just one yen sign in  
: Unicode. Take a look at this.
: 
: ¥ U+00A5 YEN SIGN
: ¥ U+FFE5 FULLWIDTH YEN SIGN
: 
: Tough they look and groks the same to human, computers handle them  
: differently.  This happened when Unicode Consortium decided to make  
: BMP round-trippable against legacy encodings.  They were distinct in  
: JIS standards, so happened Unicode.
: 
: Maybe we should avoid other symbols like this for sigils -- those not  
: in ASCII that have 'fullwidth' variations.  q($) and q(\) are okay  
: (or too late) because they are already in ASCII.  q(¥) should be  
: avoided because you can hardly tell the difference from q(¥) in the  
: display.
: 
: But this will also outlaw the cent sign.  I have attached a list of  
: those affected.  As you see, most are with ASCII equivalents but some  
: are not.

We'd have to outlaw A..Z as well.  :-)

I think a better plan might just be to say that we'll treat any fullwidth
character as equivalent to its narrow companion, at least when used as
an operator.  Canonicalizing identifiers may be another matter though.

On the other hand, certain of the double-width characters are likely to
be confused with two singles, such as 

=   FF1DFULLWIDTH EQUALS SIGN
_   FF3FFULLWIDTH LOW LINE

so maybe they should be equivalent to == and __, or outlawed.

And one could (un)reasonably argue that

~   FF5EFULLWIDTH TILDE

ought to mean ~~ rather than ~.  But in general we need to go slow
on such decisions.  For now just sticking our toe into Latin-1
is enough, as long as we're looking ahead for visual pitfalls.

As for the ¥ pitfall, so far we've intentionally been careful to use
it only where an operator is expected, whereas \ is legal only where a
term is expected.  So at least for Perl code, we can translate legacy
¥ to different codepoints.  (Whether the Japanese font distinguishes
them is another issue, of course.  I have a Unicode font on my
machine that prints backslash as ¥, which I find slightly irritating,
but doubtless will be par for the course in Japan for the foreseeable
future.  Maybe that's a good reason to allow the doublewith backslash
as an alias for normal backslash.  Maybe not.)

Anyway, I think people will be able to distinguish visually between
A ¥ B and ¥X as long as we keep the operator/term distinction.

Larry


Re: Avoid the Yen Sign [Was: Re: new sigil]

2005-10-25 Thread Larry Wall
: On 10/23/05, Autrijus Tang [EMAIL PROTECTED] wrote:
:  In addition to your handy table, the  and  french quotes, which are used
:  quite heavily in Perl 6 for both bracketing and hyper operators, also have
:  full width equivalents:
: 
:  300A;LEFT DOUBLE ANGLE BRACKET;Ps;0;ON;Y;OPENING DOUBLE ANGLE 
BRACKET
:  300B;RIGHT DOUBLE ANGLE BRACKET;Pe;0;ON;Y;CLOSING DOUBLE ANGLE 
BRACKET
: 
:  Half width: «»
:  Full width: 《》

I think we actually speculated about that identity in the Apocalypse.

:  One way to approach it is to make Perl 6 accept both full- and
:  half-width variants.
: 
:  Another way would be to use ASCII fallbacks exclusively in real programs, 
and
:  reserve unicode variants for pretty-printing, the same way that PLT Scheme 
and
:  Haskell recognizes λ in literatures, but actually write lambda and
:  \ respectively
:  in everyday coding.

I think we should enable both approaches.  Restricting Unicode characters
to literature is wrong, but so is forcing Unicode on someone prematurely.

On Sun, Oct 23, 2005 at 07:07:33PM -0400, Rob Kinyon wrote:
: Isn't this starting to be the question of why we have the Unicode
: operators instead of just functions? Would it be possible to have a
: function be infix?

At which precedence level?

Larry


RE: Avoid the Yen Sign [Was: Re: new sigil]

2005-10-25 Thread Jan Dubois
On Tue, 25 Oct 2005, Larry Wall wrote:
 As for the ¥ pitfall, so far we've intentionally been careful to use
 it only where an operator is expected, whereas \ is legal only where a
 term is expected.  So at least for Perl code, we can translate legacy
 ¥ to different codepoints.  (Whether the Japanese font distinguishes
 them is another issue, of course.  I have a Unicode font on my
 machine that prints backslash as ¥, which I find slightly irritating,
 but doubtless will be par for the course in Japan for the foreseeable
 future.  Maybe that's a good reason to allow the doublewith backslash
 as an alias for normal backslash.  Maybe not.)

BTW, the exact same thing happens with the Won sign ₩ on Korean Windows
systems; it is also mapped to 0x5c in the default codepage, and paths
are displayed with the Won sign instead of the backslash as separators.
Just something to keep in mind in case you are tempted to use the Won
sign as a sigil or operator in the future.

Cheers,
-Jan




Re: Avoid the Yen Sign [Was: Re: new sigil]

2005-10-25 Thread Juerd
Jan Dubois skribis 2005-10-25 12:33 (-0700):
 Just something to keep in mind in case you are tempted to use the Won
 sign as a sigil or operator in the future.

I don't know what stitch() will do, but this will have to be its infix
operator :)

zip ¥   Y
stitch  Won   w


Juerd
-- 
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html 
http://convolution.nl/gajigu_juerd_n.html


Re: Avoid the Yen Sign [Was: Re: new sigil]

2005-10-23 Thread Autrijus Tang
Dan Kogai wrote:
 To make the matter worse, there are not just one yen sign in  Unicode.
 Take a look at this.

 ¥ U+00A5 YEN SIGN
 ¥ U+FFE5 FULLWIDTH YEN SIGN

 Tough they look and groks the same to human, computers handle them
 differently.  This happened when Unicode Consortium decided to make  BMP
 round-trippable against legacy encodings.  They were distinct in  JIS
 standards, so happened Unicode.

In addition to your handy table, the  and  french quotes, which are used
quite heavily in Perl 6 for both bracketing and hyper operators, also have
full width equivalents:

300A;LEFT DOUBLE ANGLE BRACKET;Ps;0;ON;Y;OPENING DOUBLE ANGLE BRACKET
300B;RIGHT DOUBLE ANGLE BRACKET;Pe;0;ON;Y;CLOSING DOUBLE ANGLE BRACKET

Half width: «»
Full width: 《》

There is no way to type out the half-width yen and double angle brackets under
MSWin32, under either the traditional or simplified code pages; only full width
variants are available.

One way to approach it is to make Perl 6 accept both full- and
half-width variants.

Another way would be to use ASCII fallbacks exclusively in real programs, and
reserve unicode variants for pretty-printing, the same way that PLT Scheme and
Haskell recognizes λ in literatures, but actually write lambda and
\ respectively
in everyday coding.

TIMTOWTDI. :)

Thanks,
/Autrijus/


Re: Avoid the Yen Sign [Was: Re: new sigil]

2005-10-23 Thread Rob Kinyon
On 10/23/05, Autrijus Tang [EMAIL PROTECTED] wrote:
 Dan Kogai wrote:
  To make the matter worse, there are not just one yen sign in  Unicode.
  Take a look at this.
 
  ¥ U+00A5 YEN SIGN
  ¥ U+FFE5 FULLWIDTH YEN SIGN
 
  Tough they look and groks the same to human, computers handle them
  differently.  This happened when Unicode Consortium decided to make  BMP
  round-trippable against legacy encodings.  They were distinct in  JIS
  standards, so happened Unicode.

 In addition to your handy table, the  and  french quotes, which are used
 quite heavily in Perl 6 for both bracketing and hyper operators, also have
 full width equivalents:

 300A;LEFT DOUBLE ANGLE BRACKET;Ps;0;ON;Y;OPENING DOUBLE ANGLE BRACKET
 300B;RIGHT DOUBLE ANGLE BRACKET;Pe;0;ON;Y;CLOSING DOUBLE ANGLE BRACKET

 Half width: «»
 Full width: 《》

 There is no way to type out the half-width yen and double angle brackets under
 MSWin32, under either the traditional or simplified code pages; only full 
 width
 variants are available.

 One way to approach it is to make Perl 6 accept both full- and
 half-width variants.

 Another way would be to use ASCII fallbacks exclusively in real programs, and
 reserve unicode variants for pretty-printing, the same way that PLT Scheme and
 Haskell recognizes λ in literatures, but actually write lambda and
 \ respectively
 in everyday coding.

Isn't this starting to be the question of why we have the Unicode
operators instead of just functions? Would it be possible to have a
function be infix?

Rob