Re: Suggested magic for a .. b

2010-08-01 Thread Martin D Kealey
On Wed, 28 Jul 2010, Darren Duncan wrote:
 I think that a general solution here is to accept that there may be more
 than one valid way to sort some types, strings especially, and so
 operators/routines that do sorting should be customizable in some way so
 users can pick the behaviour they want.

 The customization could be applied at various levels, such as using an
 extra argument or trait for the operator/function that cares about
 ordering,

That much I agree wholeheartedly with, but ...

 or by using an extra attribute or trait for the types being sorted.

... puts us back where we started: how do we cope if the two endpoints
aren't tagged with the same attribute or trait or locale?

In any case I'd much rather prefer that the behaviour be lexically scoped,
with either adverbs or pragmata, not with the action-at-a-distance that's
caused by tagging something as fundamental as a String.

Yes sometimes you want the behaviour of your range to mimic the locale of
its operands, but then it should be explicit, with a trait that also
explicitly selects either the left or right operand to extract the locale
from. And probably throw an exception if they aren't in the same locale.

If you don't specify that you want locale-dependent behaviour then the
default action should be an unthrown exception unless both endpoints are
inarguably comparable, so IMHO that pretty much rules out any code-points
that are used in more than language, save perhaps raw ASCII. And even then
you really should make an explicit choice between case-sensitive and
case-insensitive comparison.

 When you want to be consistent, the behaviour of cmp affects all of the
 other order-sensitive operations, including any working with intervals.

Indeed, the range constructor and the cmp operator should have the same
adverbs and share lexical pragmata.

 So then, a cmp ส้ is always defined, but users can change the
 definition.

I take the opposite approach; it's always undefined (read, unthrown
exception) unless the user tells us how they want it treated. That can be a
command-line switch if necessary.

To paraphrase Dante, the road to hell is paved with Reasonable Defaults.
Or in programming terms, your reasonable default is the cause of my ugly
work-around.

-Martin


Re: Suggested magic for a .. b

2010-08-01 Thread Darren Duncan

Martin D Kealey wrote:

On Wed, 28 Jul 2010, Darren Duncan wrote:

I think that a general solution here is to accept that there may be more
than one valid way to sort some types, strings especially, and so
operators/routines that do sorting should be customizable in some way so
users can pick the behaviour they want.

The customization could be applied at various levels, such as using an
extra argument or trait for the operator/function that cares about
ordering,


That much I agree wholeheartedly with, but ...


or by using an extra attribute or trait for the types being sorted.


... puts us back where we started: how do we cope if the two endpoints
aren't tagged with the same attribute or trait or locale?

In any case I'd much rather prefer that the behaviour be lexically scoped,
with either adverbs or pragmata, not with the action-at-a-distance that's
caused by tagging something as fundamental as a String.


Lexical scoping *is* a good idea, and I would also imagine that users would 
frequently apply that at the file or setting level.


But making this a pragma means that the pragma would have to be a little more 
verbose than a typical pragma.


In the general format, one wouldn't just say, eg:

  collation FooNation;

... but rather it would at least be more like:

  collation Str FooNation;

... to say that you're only applying to operations involving Str types and not, 
say, Numeric types.



So then, a cmp ส้ is always defined, but users can change the
definition.


I take the opposite approach; it's always undefined (read, unthrown
exception) unless the user tells us how they want it treated. That can be a
command-line switch if necessary.

To paraphrase Dante, the road to hell is paved with Reasonable Defaults.
Or in programming terms, your reasonable default is the cause of my ugly
work-around.


That might be fair.

But if we're going to do that, then I'd like to go a step further and require 
some other operators have mandatory config arguments for users to explicitly 
state the semantics they want, but that once again a lexical pragma can declare 
this at a higher level.


I'm restating this thought in another thread, rounding method adverbs, so 
that's the best place to follow it.


-- Darren Duncan


Re: Suggested magic for a .. b

2010-08-01 Thread Leon Timmermans
On Sun, Aug 1, 2010 at 11:39 PM, Martin D Kealey
mar...@kurahaupo.gen.nz wrote:
 In any case I'd much rather prefer that the behaviour be lexically scoped,
 with either adverbs or pragmata, not with the action-at-a-distance that's
 caused by tagging something as fundamental as a String.

In many cases the collation isn't known at compile-time, so adverbs
would be necessary anyway. Pragma's can make things easier in many
cases.

Leon


Re: Suggested magic for a .. b

2010-07-30 Thread Brandon S Allbery KF8NH
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 7/29/10 08:15 , Leon Timmermans wrote:
 On Thu, Jul 29, 2010 at 3:24 AM, Darren Duncan dar...@darrenduncan.net 
 wrote:
  $foo ~~ $a..$b :QuuxNationality  # just affects this one test
 
 I like that
 
  $bar = 'hello' :QuuxNationality  # applies anywhere the Str value is used
 
 What if you compare a QuuxNationality Str with a FooNationality Str?
 That should blow up. Also it can lead to action at a distance. I don't
 think that's the way to go.

It's half right;  the coding set should be part of the type.  Explicit
conversion is probably a good idea too.

- -- 
brandon s. allbery [linux,solaris,freebsd,perl]  allb...@kf8nh.com
system administrator  [openafs,heimdal,too many hats]  allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon university  KF8NH
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxS09sACgkQIn7hlCsL25U69wCdFeqshkDQx24C6QT7Q7XlmF85
zmcAoK7969GXHUwhF9bZ+NPv8xy3qR5m
=vFdg
-END PGP SIGNATURE-


Re: Suggested magic for a .. b

2010-07-30 Thread Leon Timmermans
On Thu, Jul 29, 2010 at 9:51 PM, Aaron Sherman a...@ajs.com wrote:
 My only strongly held belief, here, is that you should not try to answer any
 of these questions for the default range operator on
 unadorned, context-less strings. For that case, you must do something that
 makes sense for all Unicode codepoints in nearly all contexts.

I find that both of limited use and the only sane possibility at the
same time :-|

Leon


Re: Suggested magic for a .. b

2010-07-30 Thread Doug McNutt
Please pardon intrusion by a novice who is anything but object oriented.

I consider myself a long time user of perl 5. I love it and it has completely 
replaced FORTRAN as my compiler of choice. Programming Perl is so dog-eared 
that I may need a replacement. I joined this list when I thought the ... 
operators might allow for vector operations like cross product. dot product, 
curl, grad, and divergence.  I was mistaken but was pleased that such things 
would be possible as add-ins to be created later.

I have never used the .. operator on perl 5, mostly because I can't 
understand it.

I have actually wished for, in perl 5, an ability to create a list, really a 
unsorted set with an @theset kind of description that I could create with a 
regular expression. All ASCII strings that would match would become members of 
@theset.

@theset = /\A2N\d\d\d\d\Z/;

would make create a temporary array of transistors that have 2N, once 
military, designations. That list would become an input to some other code that 
would look for datasheets. Memory intensive but easy to understand.

Are you guise sure that the ... and .. operators in perl 6 shouldn't make 
use of regular expression syntax while deciding just what is intended by the 
programmer?

-- 
--  The best programming tool is a soldering iron --


Re: Suggested magic for a .. b

2010-07-30 Thread Aaron Sherman
On Fri, Jul 30, 2010 at 6:45 PM, Doug McNutt dougl...@macnauchtan.com wrote:
 Please pardon intrusion by a novice who is anything but object oriented.

No problem. Sometimes a fresh perspective helps to illuminate things.

Skipping ahead...

 Are you guise sure that the ... and .. operators in perl 6 shouldn't make 
 use of regular expression syntax while deciding just what is intended by the 
 programmer?

You kind of blew my mind, there. I tried to respond twice and each
time I determined that there was a way around what I was about to call
crazy.

In the end, I'm now questioning the difference between a junction and
a Range... which is not where I thought this would go. Good question,
though I should point out that you could never reasonably listify a
range constructed from a regex because reversing a regex like that
immediately runs into some awful edge cases. Still, interesting stuff.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for a .. b

2010-07-30 Thread Jon Lang
Aaron Sherman wrote:
 In the end, I'm now questioning the difference between a junction and
 a Range... which is not where I thought this would go.

Conceptually, they're closely related.  In particular, a range behaves
a lot like an any() junction.  Some differences:

1. An any() junction always has a discrete set of options in it; but a
Range could (and generally does) have a continuous set of options.

2. An any() junction can have an arbitrary set of options; a Range's
set of options is defined entirely by its endpoints.

-- 
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-29 Thread Jon Lang
On Wed, Jul 28, 2010 at 10:35 PM, Brandon S Allbery KF8NH
allb...@ece.cmu.edu wrote:
  On 7/28/10 8:07 PM, Michael Zedeler wrote:
 On 2010-07-29 01:39, Jon Lang wrote:
 Aaron Sherman wrote:
 In smart-match context, a..b includes aardvark.
 No one has yet explained to me why that makes sense. The continued
 use of
 ASCII examples, of course, doesn't help. Does a .. b include
 æther?
 This is where Germans and Swedes, for example, don't agree, but
 they're all
 using the same Latin code blocks.
 This is definitely something for the Unicode crowd to look into.  But
 whatever solution you come up with, please make it compatible with the
 notion that aardvark..apple can be used to match any word in the
 dictionary that comes between those two words.
 The key issue here is whethere there is a well defined and meaningful
 ordering of the characters in question. We keep discussing the nice
 examples, but how about apple .. ส้ม?

 I thought that was already disallowed by spec.

As a range, it ought to work; it's only when you try to generate a
list from it that you run into trouble, as the spec currently assumes
that z.succ eqv aa.

Anyway: whatever default algorithm we go with for resolving cmp, I
strongly recommend that we define the default .succ so that $x lt
$x.succ is always true.

-- 
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-29 Thread Leon Timmermans
On Thu, Jul 29, 2010 at 3:24 AM, Darren Duncan dar...@darrenduncan.net wrote:
 Some possible examples of customization:

  $foo ~~ $a..$b :QuuxNationality  # just affects this one test

I like that

  $bar = 'hello' :QuuxNationality  # applies anywhere the Str value is used


What if you compare a QuuxNationality Str with a FooNationality Str?
That should blow up. Also it can lead to action at a distance. I don't
think that's the way to go.

Leon


Re: Suggested magic for a .. b

2010-07-29 Thread yary
On Thu, Jul 29, 2010 at 5:15 AM, Leon Timmermans faw...@gmail.com wrote:
 On Thu, Jul 29, 2010 at 3:24 AM, Darren Duncan dar...@darrenduncan.net 
 wrote:
 Some possible examples of customization:

  $foo ~~ $a..$b :QuuxNationality  # just affects this one test

 I like that

  $bar = 'hello' :QuuxNationality  # applies anywhere the Str value is used


 What if you compare a QuuxNationality Str with a FooNationality Str?
 That should blow up. Also it can lead to action at a distance. I don't
 think that's the way to go.

I think it's an elegant use of encapsulation- keeping a string's
locale with the string. If the you want to compare two strings with
different collations, either-
 $foo ~~ $a..$b :QuuxNationality  # override the locales for this test
or
  $foo ~~ $a..$b # Perl warns about conflict, and falls back to its default

-y


Re: Suggested magic for a .. b

2010-07-29 Thread Aaron Sherman
On Wed, Jul 28, 2010 at 9:24 PM, Darren Duncan dar...@darrenduncan.netwrote:

 Jon Lang wrote:

 I don't know enough about Unicode to suggest how to solve this.


Thankfully, I know little enough to take up the challenge ;-)


  All I can
 say is that my example above should never return a valid Range object
 unless
 there is a way I can specify my own ordering and I use it.


Please see my suggested approach way, way back at the start of all this. Use
Unicode scripts, properties and codepoint sequences to produce a list of
codepoints. Want something more meaningful than codepoints? Great, use an
object that knows what you're asking for:

   EnglishDictword(apple) .. EnglishDictWord(orange)

It's a very Perl way to approach a problem: provide the solution that meets
the least common denominator need (return a range object that represents
ranges based on the information we have) and then allow that same feature to
be used in cases where the user has provided sufficient context to do
something smarter.

I don't think it makes sense to extend the length of strings under
consideration by default. Obviously the above example would include
blackberry because you've asked it to consider English dictionary words,
but aa .. zz shouldn't contain blackberry because you don't have
enough data to understand what's being asked for, and thus should fall back
to treating strings as lists of codepoints (speaking of which do we define a
behavior for (1,2,3) .. (4,5,6)? Right now, we consider (1,2,7) to be in
that range, and I don't think that's a terribly useful result).




 That actually says something: it says that we may want to reconsider
 the notion that all string values can be sorted.  You're suggesting
 the possibility that a cmp ส้ is, by default, undefined.



By default, I think it should by +1 because of the codepoint comparison. If
you then tell Perl that you want that comparison done in a Thai context,
then it's probably -1.

The golden rule of Unicode is: never pretend you have more information than
you do.




 I think that a general solution here is to accept that there may be more
 than one valid way to sort some types, strings especially, and so
 operators/routines that do sorting should be customizable in some way so
 users can pick the behaviour they want.


And I think that this brings you back to what I was saying at the top of the
thread which is that the most basic approach treats each codepoint as a
collection of information and sorts on that information first and then the
codepoint number itself. If that's not useful to you, tell Perl what you
really wanted.



 Some possible examples of customization:

  $foo ~~ $a..$b :QuuxNationality  # just affects this one test

  $bar = 'hello' :QuuxNationality  # applies anywhere the Str value is used


That's a bit too easy to read without thinking about the implications. I
bring back my original example from long ago:

TOPIXコンポジット1500構成銘柄 which I shamelessly grabbed from a Tokyo Stock
Exchange page. That one string, used in everyday text, contains Latin
letters, Hiragana [I lied, there's no Hiragana], Katakana, Han or Kanji
idiograms and Latin digits.


Now call .succ on that sucker, I dare you, keeping in mind that there's no
one Japanese script in Unicode. I think the only valid starting point
without any contextual information is to essentially treat it as a sequence
of codepoints (as if it were an array of integers) and do something
marginally sane on that basis. Then you let the user provide you with hints.
Yes, it's Japanese language but that doesn't tell you as much as you'd
hope, since many of the rules come from the languages that Japanese is
borrowing from, here.

One answer is to break it down on script and major category property
boundaries into TOPIX (Latin: the name of an index), コンポジット (Katakana:
phonetically this is konpozito or composite), 1500 (Latin digits), and
構成銘柄 (Kanji ideographs: constituents). Now, treat each one of those as a
separate sequence of codepoints and begin incrementing each sub-sequence in
turn. You could also apply Japanese sorting rules to the successor method,
but then you get into questions of what the Japanese sorting method is for
Latin letters... probably a solved problem, but obscure enough that I'll bet
there are edge cases that are NOT solvable just by knowing that the locale
because they are finer grained (e.g. which Latin-using language does the
word come from? What source language is most appropriate for the context?
etc.)

Maybe you throw an exception when you try to tell Perl that 
TOPIXコンポジット1500構成銘柄 is a Japanese string... but then Perl is rejecting
strings that are considered valid in some contexts within that language.

My only strongly held belief, here, is that you should not try to answer any
of these questions for the default range operator on
unadorned, context-less strings. For that case, you must do something that
makes sense for all Unicode codepoints in nearly all contexts.

-- 
Aaron 

Re: Suggested magic for a .. b

2010-07-28 Thread Michael Zedeler

On 2010-07-28 06:54, Martin D Kealey wrote:

On Wed, 28 Jul 2010, Michael Zedeler wrote:
   

Writing for ($a .. $b).reverse -  $c { ...} may then blow up because it
turns out that $b doesn't have a .succ method when coercing to sequence
(where the LHS must have an initial value), just like
 for $a .. $b -  $c { ... }
should be able to blow up because the LHS of a Range shouldn't have to
support .succ.
 

Presumably you'd only throw that except if, as well, $b doesn't support .pred ?
   
Yes. It should be .pred. So ($a .. $b).reverse is only possible if 
$b.pred is defined and $a.gt is defined (and taking an object that has 
the type of $b.pred). If the coercion to Sequence is taking place first, 
we'll have to live with two additional constraints ($b.lt and $a.succ), 
but I guess it would be easy to overload .reverse and get rid of those.


Regards,

Michael.





Re: Suggested magic for a .. b

2010-07-28 Thread Darren Duncan

Michael Zedeler wrote:
This is exactly why I keep writing posts about Ranges being defunct as 
they have been specified now. If we accept the premise that Ranges are 
supposed to define a kind of linear membership specification between two 
starting points (as in math), it doesn't make sense that the LHS has an 
additional constraint (having to provide a .succ method). All we should 
require is that both endpoints supports comparison (that they share a 
common type with comparison, at least).


Yes, I agree 100%.  All that should be required to construct a range 
$foo..$bar is that the endpoints are comparable, meaning $foo cmp $bar 
works.  Having a .pred or .succ for $foo|$bar should not be required to define a 
range but only to use that range as a generator. -- Darren Duncan


Re: Suggested magic for a .. b

2010-07-28 Thread Dave Whipp

Michael Zedeler wrote:

This is exactly why I keep writing posts about Ranges being defunct as 
they have been specified now. If we accept the premise that Ranges are 
supposed to define a kind of linear membership specification between two 
starting points (as in math), it doesn't make sense that the LHS has an 
additional constraint (having to provide a .succ method). All we should 
require is that both endpoints supports comparison (that they share a 
common type with comparison, at least).


To squint at this slightly, in the context that we already have 0...1e10 
as a sequence generator, perhaps the semantics of iterating a range 
should be unordered -- that is,


  for 0..10 - $x { ... }

is treated as

  for (0...10).pick(*) - $x { ... }

Then the whole question of reversibility is moot. Plus, there would then 
be useful distinction for serialization of C.. Vs C (perhaps we 
should even parallelize) When you have two very similar operators it's 
often good to maximize the semantic distance between them so that people 
don't get into the lazy habit of using them without thinking.


Re: Suggested magic for a .. b

2010-07-28 Thread Jon Lang
Dave Whipp wrote:
 To squint at this slightly, in the context that we already have 0...1e10 as
 a sequence generator, perhaps the semantics of iterating a range should be
 unordered -- that is,

  for 0..10 - $x { ... }

 is treated as

  for (0...10).pick(*) - $x { ... }

 Then the whole question of reversibility is moot.

No thanks; I'd prefer it if $a..$b have analogous meanings in item and
list contexts.  As things stand, 10..1 means, in item context,
numbers that are greater or equal to ten and less than or equal to
one, which is equivalent to nothing; in list context, it means an
empty list. This makes sense to me; having it provide a list
containing the numbers 1 through 10 creates a conflict between the two
contexts regardless of how they're arranged.

As I see it, C $a..$b  in list context is a useful shorthand for C
$a, *.succ ... $b .  You only get into trouble when you start trying
to have infix:.. do more than that in list context.

If anything needs to be done with respect to infix:.., it lies in
changing the community perception of the operator.  The only reason
why we're having this debate at all is that in Perl 5, the .. operator
was used to generate lists; so programmers coming from Perl 5 start
with the expectation that that's what it's for in Perl 6, too.  That
expectation needs to be corrected as quickly as can be managed, not
catered to.  But that's not a matter of language design; it's a matter
to be addressed by whoever's going to be writing the Perl 6 tutorials.

-- 
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-28 Thread Moritz Lenz
Dave Whipp wrote:
 To squint at this slightly, in the context that we already have 0...1e10 
 as a sequence generator, perhaps the semantics of iterating a range 
 should be unordered -- that is,
 
for 0..10 - $x { ... }
 
 is treated as
 
for (0...10).pick(*) - $x { ... }

Sorry, I have to ask. Are you serious? Really?

Cheers,
Moritz


Re: Suggested magic for a .. b

2010-07-28 Thread yary
On Wed, Jul 28, 2010 at 8:34 AM, Dave Whipp d...@dave.whipp.name wrote:
 To squint at this slightly, in the context that we already have 0...1e10 as
 a sequence generator, perhaps the semantics of iterating a range should be
 unordered -- that is,

  for 0..10 - $x { ... }

 is treated as

  for (0...10).pick(*) - $x { ... }

Makes me think about parallel operations.

for 0...10 - $x { ... } # 0 through 10 in order
for 0..10 - $x { ... } # Spawn 11 threads, $x=0 through 10 concurrently
for 10..0 - $x { ... } # A no-op
for 10...0 - $x { ... } # 10 down to 0 in order

though would a parallel batch of an anonymous block be more naturally written as
all(0...10) - $x { ... } # Spawn 11 threads

-y


Re: Suggested magic for a .. b

2010-07-28 Thread Moritz Lenz
yary wrote:
 though would a parallel batch of an anonymous block be more naturally written 
 as
 all(0...10) - $x { ... } # Spawn 11 threads

No,

hyper  for 0..10 - $x { ... } # spawn as many threads
# as the compiler thinks are reasonable

I think one (already specced) syntax for the same thing is enough,
especially considering that hyper operators also do the same job.

Cheers,
Moritz


Re: Suggested magic for a .. b

2010-07-28 Thread TSa (Thomas Sandlaß)
On Wednesday, 28. July 2010 05:12:52 Michael Zedeler wrote:
 Writing ($a .. $b).reverse doesn't make any sense if the result were a
 new Range, since Ranges should then only be used for inclusion tests (so
 swapping endpoints doesn't have any meaningful interpretation), but
 applying .reverse could result in a coercion to Sequence.

Swapping the endpoints could mean swapping inside test to outside
test. The only thing that is needed is to swap from  to ||:

   $a .. $b   # means  $a = $_  $_ = $b  if $a  $b
   $b .. $a   # means  $b = $_ || $_ = $a  if $a  $b

Regards TSa.
-- 
The unavoidable price of reliability is simplicity -- C.A.R. Hoare
Simplicity does not precede complexity, but follows it. -- A.J. Perlis
1 + 2 + 3 + 4 + ... = -1/12  -- Srinivasa Ramanujan


Re: Suggested magic for a .. b

2010-07-28 Thread yary
 Swapping the endpoints could mean swapping inside test to outside
 test. The only thing that is needed is to swap from  to ||:

 $a .. $b # means $a = $_  $_ = $b if $a  $b
 $b .. $a # means $b = $_ || $_ = $a if $a  $b

I think that's what not, ! are for!


Re: Suggested magic for a .. b

2010-07-28 Thread Jon Lang
TSa wrote:
 Swapping the endpoints could mean swapping inside test to outside
 test. The only thing that is needed is to swap from  to ||:

   $a .. $b   # means  $a = $_  $_ = $b  if $a  $b
   $b .. $a   # means  $b = $_ || $_ = $a  if $a  $b

This is the same sort of discontinuity of meaning that was causing
problems with Perl 5's use of negative indices to count backward from
the end of a list; there's a reason why Perl 6 now uses the [*-$a]
notation for that sort of thing.

Consider a code snippet where the programmer is given two values: one
is a minimum value which must be reached; the other is a maximum value
which must not be exceeded.  In this example, the programmer does not
know what the values are; for all he knows, the minimum threshold
exceeds the maximum.  As things stand, it's trivial to test whether or
not your sample value is viable: if $x ~~ $min .. $max, then you're
golden: it doesn't matter what $min cmp $max is.  With your change,
I'd have to replace the above with something along the lines of:
  if $min = $max  $x ~~ $min .. $max { ... } - because if $min 
$max, the algorithm will accept values that are well below the minimum
as well as values that are well above the maximum.

Keep it simple, folks!  There are enough corner cases in Perl 6 as
things stand; we don't need to be introducing more of them if we can
help it.

-- 
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-28 Thread Mark J. Reed
On Wednesday, July 28, 2010, Jon Lang datawea...@gmail.com wrote:
 Keep it simple, folks!  There are enough corner cases in Perl 6 as
 things stand; we don't need to be introducing more of them if we can
 help it.

Can I get an Amen?  Amen!


-- 
Mark J. Reed markjr...@gmail.com


Re: Suggested magic for a .. b

2010-07-28 Thread Mark J. Reed
On Wed, Jul 28, 2010 at 2:30 PM, Chris Fields cjfie...@illinois.edu wrote:
 On Jul 28, 2010, at 1:27 PM, Mark J. Reed wrote:
 Can I get an Amen?  Amen!
 --
 Mark J. Reed markjr...@gmail.com

 +1.  I'm agnostic ;

Militant?  :)  ( http://tinyurl.com/3xjgxnl )

Nothing inherently religious about amen (or me), but I'll accept
+1 as synonymous.   :)

-- 
Mark J. Reed markjr...@gmail.com


Re: Suggested magic for a .. b

2010-07-28 Thread Dave Whipp

Moritz Lenz wrote:

Dave Whipp wrote:

   for 0..10 - $x { ... }
is treated as
   for (0...10).pick(*) - $x { ... }


Sorry, I have to ask. Are you serious? Really?


Ah, to reply, or not to reply, to rhetorical sarcasm ... In this case, I 
think I will:


Was my specific proposal entirely serious: only in that it was an 
attempt to broaden the box for the discussion of semantics of coercion 
ranges. One of the banes of my life is to undo the sequential mindset 
that so many programmers have. I like to point out that 
sequentialization is an optimization to make programs run faster on 
Von-Neumann architectures. Often, it's premature. Most of the time it 
doesn't matter (compilers, and even HW, can extract ILP), but every now 
and again it results in an unfortunate barrier in solution-space.


Why do we assume that ranges iterate in .succ order -- or even that they 
iterate as integers (and are finite). Why not iterate as a top-down 
breadth-first generation of a Cantor set? etc. Does the language need to 
choose a default, or is it better require the programmer to state how 
they want to coerce the range to the seq. Ten years from now, we'll keep 
needing to refer questions to the .. Vs ... faq.


Re: Suggested magic for a .. b

2010-07-28 Thread Moritz Lenz
Dave Whipp wrote:
 Moritz Lenz wrote:
 Dave Whipp wrote:
for 0..10 - $x { ... }
 is treated as
for (0...10).pick(*) - $x { ... }
 
 Sorry, I have to ask. Are you serious? Really?
 
 Ah, to reply, or not to reply, to rhetorical sarcasm ... In this case, I 
 think I will:

No sarcasm involved, just curiosity.

 Was my specific proposal entirely serious: only in that it was an 
 attempt to broaden the box for the discussion of semantics of coercion 
 ranges.

I fear what Perl 6 needs is not to broaden the range of discussion even
further, but to narrow it down to the essential points. Personal opinion
only.

 Why do we assume that ranges iterate in .succ order -- or even that they 
 iterate as integers (and are finite). Why not iterate as a top-down 
 breadth-first generation of a Cantor set?

That's easy: Principle of least surprise.

Cheers.
Moritz


Re: Suggested magic for a .. b

2010-07-28 Thread Dave Whipp

Moritz Lenz wrote:


I fear what Perl 6 needs is not to broaden the range of discussion even
further, but to narrow it down to the essential points. Personal opinion
only.


OK, as a completely serious proposal, the semantics of for 0..10 { ... 
} should be for the compiler to complain sorry, that's a perl5ism: in 
perl6, please use a C... or explicit coercion of the range to a sequence.



(BTW, I thought a bit more about my previous suggestion: there is 
precedent in that %hash.keys is unordered -- so it's not entirely 
obvious that a default range coercion should be ordered)


Re: Suggested magic for a .. b

2010-07-28 Thread Aaron Sherman
On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp d...@dave.whipp.name wrote:

 To squint at this slightly, in the context that we already have 0...1e10 as
 a sequence generator, perhaps the semantics of iterating a range should be
 unordered -- that is,

  for 0..10 - $x { ... }

 is treated as

  for (0...10).pick(*) - $x { ... }


As others have pointed out, this has some problems. You can't implement 0..*
that way, just for starters.


 Then the whole question of reversibility is moot.


Really? I don't think it is. In fact, you've simply made the problem pop up
everywhere, and guaranteed that .. must behave totally unlike any other
iterator.

Getting back to 10..0...

The complexity of implementation argument doesn't really hold for me, as:

   (a..b).list = ab ?? a,*.pred ... b !! a,*.succ ... b

Is pretty darned simple and does not require that b implement anything more
than it does under the current implementation. a, on the other hand, now has
to (optionally, since throwing an exception is the alternative) implement
one more method.

The more I look at this, the more I think .. and ... are reversed. ..
has a very specific and narrow usage (comparing ranges) and ... is
probably going to be the most broadly used operator in the language outside
of quotes, commas and the basic, C-derived math and logic ops. Many (most?)
loops will involve  Most array initializers will involve  Why
are we not calling that ..? Just because we defined .. first, and it
grandfathered its way in the door? Because it resembles the math op? These
don't seem like good reasons.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for a .. b

2010-07-28 Thread yary
On Wed, Jul 28, 2010 at 2:29 PM, Aaron Sherman a...@ajs.com wrote:

 The more I look at this, the more I think .. and ... are reversed. ..
 has a very specific and narrow usage (comparing ranges) and ... is
 probably going to be the most broadly used operator in the language outside
 of quotes, commas and the basic, C-derived math and logic ops.

+1

Though it being the day before Rakudo *'s first release makes me
think, too late!

-y


Re: Suggested magic for a .. b

2010-07-28 Thread Leon Timmermans
On Wed, Jul 28, 2010 at 11:29 PM, Aaron Sherman a...@ajs.com wrote:
 The more I look at this, the more I think .. and ... are reversed. ..
 has a very specific and narrow usage (comparing ranges) and ... is
 probably going to be the most broadly used operator in the language outside
 of quotes, commas and the basic, C-derived math and logic ops. Many (most?)
 loops will involve  Most array initializers will involve  Why
 are we not calling that ..? Just because we defined .. first, and it
 grandfathered its way in the door? Because it resembles the math op? These
 don't seem like good reasons.

I was thinking the same. Switching them seems better from a huffmanization POV.

Leon


Re: Suggested magic for a .. b

2010-07-28 Thread Darren Duncan

Aaron Sherman wrote:

The more I look at this, the more I think .. and ... are reversed. ..
has a very specific and narrow usage (comparing ranges) and ... is
probably going to be the most broadly used operator in the language outside
of quotes, commas and the basic, C-derived math and logic ops. Many (most?)
loops will involve  Most array initializers will involve  Why
are we not calling that ..? Just because we defined .. first, and it
grandfathered its way in the door? Because it resembles the math op? These
don't seem like good reasons.


I would rather that .. stay with intervals and ... with generators.  The 
mnemonics make more sense that way.  Having .. resemble the math op with the 
same meaning, intervals, is a good thing.  Besides comparing ranges, an interval 
would also often be used for a membership test, eg $a = $x = $b would 
alternately be spelled $x ~~ $a..$b for example.  I would imagine that the 
interval use would be more common than the generator use in some problem 
domains. -- Darren Duncan


Re: Suggested magic for a .. b

2010-07-28 Thread Darren Duncan

Darren Duncan wrote:

Aaron Sherman wrote:
The more I look at this, the more I think .. and ... are reversed. 

snip
I would rather that .. stay with intervals and ... with generators.  

snip

Another thing to consider if one is looking at huffmanization is how often the 
versions that exclude endpoints would be used, such as ^..^.


I would imagine that a sequence generator would also have this variability 
useful.

Does ... also come with the 4 variations of endpoint inclusion/exclusion?

If not, then it should, as I'm sure many times one would want to do this, say:

  for 0...^$n - {...}

In any event, I still think that the mnemonics of ... (yadda-yadda-yadda) are 
more appropriate to a generator, where it says produce this and so on.  A .. 
does not have that mnemonic and looks better for an interval.


-- Darren Duncan


Re: Suggested magic for a .. b

2010-07-28 Thread Dave Whipp

Aaron Sherman wrote:

On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp d...@dave.whipp.name wrote:


To squint at this slightly, in the context that we already have 0...1e10 as
a sequence generator, perhaps the semantics of iterating a range should be
unordered -- that is,

 for 0..10 - $x { ... }

is treated as

 for (0...10).pick(*) - $x { ... }



As others have pointed out, this has some problems. You can't implement 0..*
that way, just for starters.


I'd say that' a point in may favor: it demonstrates the integers and 
strings have similar problems. If you pick items from an infinite set 
then every item you pick will have an infinite number of digits/characters.


In smart-match context, a..b includes aardvark. It follows that, 
unless you're filtering/shaping the sequence of generated items, then 
almost every element (a..b).Seq starts with an infinite number of as.


Consistent semantics would make a..b very not-useful when used as a 
sequence: the user needs to say how they want to avoid the infinities. 
Similarly (0..1).Seq should most likely return Real numbers -- and thus 
(0..1).pick(*) can be approximated by (0..1).pick(*, :replace), which is 
much easier to implement.


So either you define some arbitrary semantics (what those should be is, 
I think, the original topic of this thread) or else you punt (error 
message). An error message has the advantage that you can always do 
something useful, later.



Then the whole question of reversibility is moot.

Really? I don't think it is. In fact, you've simply made the problem pop up
everywhere, and guaranteed that .. must behave totally unlike any other
iterator.


%hash.keys has similarly unordered semantics. Therefore 
%hash.keys.reverse is, for most purposes, equivalent to %hash.keys. That 
is why I said the question of reversibility becomes moot if you define 
the collapse of a range to a sequence to be unordered. It also 
demonstrates precedent, so not totally unlike any other.


Even though it was only a semi-serious proposal, I seem to find myself 
defending it. So maybe I was serious, afterall. That argument for DWIM 
being ordered pretty much goes away once you tell people to use ... 
for what they intended to mean.




Getting back to 10..0


Yes, I agree with Jon that this should be an empty range. I don't care 
what order you pick the elements from an empty range :).


Re: Suggested magic for a .. b

2010-07-28 Thread Darren Duncan

Dave Whipp wrote:

Similarly (0..1).Seq should most likely return Real numbers


No it shouldn't, because the endpoints are integers.

If you want Real numbers, then say 0.0 .. 1.0 instead.

-- Darren Duncan


Re: Suggested magic for a .. b

2010-07-28 Thread Dave Whipp

Darren Duncan wrote:

Dave Whipp wrote:

Similarly (0..1).Seq should most likely return Real numbers


No it shouldn't, because the endpoints are integers.

If you want Real numbers, then say 0.0 .. 1.0 instead.

-- Darren Duncan


That would be inconsistent. $x ~~ 0..1 means 0 = $x = 1. The fact that 
the endpoints are integers does not imply the the range does not include 
non-integer reals.


My argument is that iterating a range could be defined to give you a 
uniform distribution of values that would smart match true against that 
range -- and that such a definition would be just as reasonable as (and 
perhaps more general than) one that says that you get an incrementing 
ordered set of integers across that range.


Re: Suggested magic for a .. b

2010-07-28 Thread Aaron Sherman
On Wed, Jul 28, 2010 at 6:24 PM, Dave Whipp d...@dave.whipp.name wrote:

 Aaron Sherman wrote:

 On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp d...@dave.whipp.name
 wrote:

  To squint at this slightly, in the context that we already have 0...1e10
 as
 a sequence generator, perhaps the semantics of iterating a range should
 be
 unordered -- that is,

  for 0..10 - $x { ... }

 is treated as

  for (0...10).pick(*) - $x { ... }


 As others have pointed out, this has some problems. You can't implement
 0..*
 that way, just for starters.


 I'd say that' a point in may favor: it demonstrates the integers and
 strings have similar problems. If you pick items from an infinite set then
 every item you pick will have an infinite number of digits/characters.


So, if I understand you correctly, you're happy about the fact that
iterating over and explicitly lazy range would immediately result in
failure? Sorry, not following.



 In smart-match context, a..b includes aardvark.


No one has yet explained to me why that makes sense. The continued use of
ASCII examples, of course, doesn't help. Does a .. b include æther?
This is where Germans and Swedes, for example, don't agree, but they're all
using the same Latin code blocks.

I don't think you can reasonably bring locale into this. I think it needs to
be purely a codepoint-oriented operator. If you bring locale into it, then
the argument for not including composing an modifying characters goes out
the window, and you're stuck in what I believe Dante called the Unicode
circle. If you treat this as a codepoint-based operator then you get a very
simple result: a..b is the range between the codepoint for a and the
codepoint for b. aa .. bb is the range between a sequence of two
codepoints and a sequence of two other code points, which you can define in
a number of ways (we've discussed a few, here) which don't involve having to
expand the sequences to three or more codepoints.

I've never accepted that the range between two strings of identical length
should include strings of another length. That seems maximally non-intuitive
(well, I suppose you could always return the last 100 words of Hamlet as an
iterable IO object if you really wanted to confuse people), and makes string
and integer ranges far too divergent.



  Then the whole question of reversibility is moot.

 Really? I don't think it is. In fact, you've simply made the problem pop
 up
 everywhere, and guaranteed that .. must behave totally unlike any other
 iterator.


 %hash.keys has similarly unordered semantics.


Unordered semantics and shuffled values aren't the same thing. The reason
that hash keys are unordered is that we cannot guarantee that any given
implementation will store entries in any given relation to the input. Ranges
have a well defined ordering associated with the elements that fall within
the range by virtue of the basic definition of a range (LHS = * = RHS).
Hashes have no ordering associated with their keys (though one can be
imposed, e.g. by sort).


Therefore %hash.keys.reverse is, for most purposes, equivalent to
 %hash.keys.


Argh! No, that's entirely untrue. %hash.keys and %hash.keys.reverse had
better be the same elements, but reversed for all hashes which remain
unmodified between the first and second call.


-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for a .. b

2010-07-28 Thread Aaron Sherman
On Wed, Jul 28, 2010 at 6:24 PM, Dave Whipp d...@dave.whipp.name wrote:

 Aaron Sherman wrote:

 On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp d...@dave.whipp.name
 wrote:

  To squint at this slightly, in the context that we already have 0...1e10
 as
 a sequence generator, perhaps the semantics of iterating a range should
 be
 unordered -- that is,

  for 0..10 - $x { ... }

 is treated as

  for (0...10).pick(*) - $x { ... }


 As others have pointed out, this has some problems. You can't implement
 0..*
 that way, just for starters.


 I'd say that' a point in may favor: it demonstrates the integers and
 strings have similar problems. If you pick items from an infinite set then
 every item you pick will have an infinite number of digits/characters.


So, if I understand you correctly, you're happy about the fact that
iterating over and explicitly lazy range would immediately result in
failure? Sorry, not following.



 In smart-match context, a..b includes aardvark.


No one has yet explained to me why that makes sense. The continued use of
ASCII examples, of course, doesn't help. Does a .. b include æther?
This is where Germans and Swedes, for example, don't agree, but they're all
using the same Latin code blocks.

I don't think you can reasonably bring locale into this. I think it needs to
be purely a codepoint-oriented operator. If you bring locale into it, then
the argument for not including composing an modifying characters goes out
the window, and you're stuck in what I believe Dante called the Unicode
circle. If you treat this as a codepoint-based operator then you get a very
simple result: a..b is the range between the codepoint for a and the
codepoint for b. aa .. bb is the range between a sequence of two
codepoints and a sequence of two other code points, which you can define in
a number of ways (we've discussed a few, here) which don't involve having to
expand the sequences to three or more codepoints.

I've never accepted that the range between two strings of identical length
should include strings of another length. That seems maximally non-intuitive
(well, I suppose you could always return the last 100 words of Hamlet as an
iterable IO object if you really wanted to confuse people), and makes string
and integer ranges far too divergent.



  Then the whole question of reversibility is moot.

 Really? I don't think it is. In fact, you've simply made the problem pop
 up
 everywhere, and guaranteed that .. must behave totally unlike any other
 iterator.


 %hash.keys has similarly unordered semantics.


Unordered semantics and shuffled values aren't the same thing. The reason
that hash keys are unordered is that we cannot guarantee that any given
implementation will store entries in any given relation to the input. Ranges
have a well defined ordering associated with the elements that fall within
the range by virtue of the basic definition of a range (LHS = * = RHS).
Hashes have no ordering associated with their keys (though one can be
imposed, e.g. by sort).


Therefore %hash.keys.reverse is, for most purposes, equivalent to
 %hash.keys.


Argh! No, that's entirely untrue. %hash.keys and %hash.keys.reverse had
better be the same elements, but reversed for all hashes which remain
unmodified between the first and second call.


-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for a .. b

2010-07-28 Thread Jon Lang
Darren Duncan wrote:
 Does ... also come with the 4 variations of endpoint inclusion/exclusion?

 If not, then it should, as I'm sure many times one would want to do this,
 say:

  for 0...^$n - {...}

You can toggle the inclusion/exclusion of the ending condition by
choosing between ... and ...^; but the starting point is the
starting point no matter what: there is neither ^... nor ^...^.

 In any event, I still think that the mnemonics of ... (yadda-yadda-yadda)
 are more appropriate to a generator, where it says produce this and so on.
  A .. does not have that mnemonic and looks better for an interval.

Well put.  This++.

-- 
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-28 Thread Jon Lang
Aaron Sherman wrote:
 In smart-match context, a..b includes aardvark.


 No one has yet explained to me why that makes sense. The continued use of
 ASCII examples, of course, doesn't help. Does a .. b include æther?
 This is where Germans and Swedes, for example, don't agree, but they're all
 using the same Latin code blocks.

This is definitely something for the Unicode crowd to look into.  But
whatever solution you come up with, please make it compatible with the
notion that aardvark..apple can be used to match any word in the
dictionary that comes between those two words.

 I've never accepted that the range between two strings of identical length
 should include strings of another length. That seems maximally non-intuitive
 (well, I suppose you could always return the last 100 words of Hamlet as an
 iterable IO object if you really wanted to confuse people), and makes string
 and integer ranges far too divergent.

This is why I dislike the notion of the range operator being used to
produce lists: the question of what values you'd get by iterating from
one string value to another is _very_ different from the question of
what string values qualify as being between the two.  The more you use
infix:.. to produce lists, the more likely you are to conflate lists
with ranges.

-- 
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-28 Thread Michael Zedeler

On 2010-07-29 00:24, Dave Whipp wrote:

Aaron Sherman wrote:
On Wed, Jul 28, 2010 at 11:34 AM, Dave Whipp d...@dave.whipp.name 
wrote:


To squint at this slightly, in the context that we already have 
0...1e10 as
a sequence generator, perhaps the semantics of iterating a range 
should be

unordered -- that is,

 for 0..10 - $x { ... }

is treated as

 for (0...10).pick(*) - $x { ... }



As others have pointed out, this has some problems. You can't 
implement 0..*

that way, just for starters.


I'd say that' a point in may favor: it demonstrates the integers and 
strings have similar problems. If you pick items from an infinite set 
then every item you pick will have an infinite number of 
digits/characters.


In smart-match context, a..b includes aardvark. It follows that, 
unless you're filtering/shaping the sequence of generated items, then 
almost every element (a..b).Seq starts with an infinite number of 
as.


Consistent semantics would make a..b very not-useful when used as 
a sequence: the user needs to say how they want to avoid the 
infinities. Similarly (0..1).Seq should most likely return Real 
numbers -- and thus (0..1).pick(*) can be approximated by 
(0..1).pick(*, :replace), which is much easier to implement.
I agree that /in theory/ coercing from Range to Sequence, the new 
Sequence should produce every possible value in the Range, unless you 
specify an increment. You could argue that 0 and 1 in (0..1).Seq are 
Ints, resulting in the expansion 0, 1, but that would leave a door open 
for very nasty surprises.


In practise, producing every possible value in a Range with 
over-countable items isn't useful and just opens the door for 
inexperienced programmers to make perl run out of memory without ever 
producing a warning, so I'd suggest that the conversion should fail 
unless an increment is specified.


The general principle would be to avoid meaningless conversions, so (1 
.. *).Seq  (1 .. *).pick should also just fail, but with finite 
endpoints, it could succeed. The question here is whether we should open 
for more parallelization at the cost of simplicity. I don't know.


So either you define some arbitrary semantics (what those should be 
is, I think, the original topic of this thread) or else you punt 
(error message). An error message has the advantage that you can 
always do something useful, later.
I second that just doing something arbitrary where no actual definition 
exists is a really bad idea. To be more specific, there should be no 
.succ or .pred methods on Rat, Str, Real, Complex and anything else that 
is over-countable. Trying to implement .succ on something like Str is 
most likely dwimmy to a very narrow set of applications, but will 
confuse everyone else.


Just to illustrate my point, if we have .succ on Str, why not have it on 
Range or Seq?


Let's just play with that idea for a second - what would a reasonable 
implementation of .succ on Range be?


(1 .. 10).succ --?-- (1 .. 11)
(1 .. 10).succ --?-- (2 .. 11)
(1 .. 10).succ --?-- (1 .. 12)
(1 .. 10).succ --?-- (10^ .. *)

Even starting a discussion about which implementation of .succ for Range 
(above), Str, Rat or Real completely misses the point: there is no 
definition of this function for those domains. It is non-existent and 
trying to do something dwimmy is just confusing.


As a sidenote, ++ and .succ should be treated as two different things 
(just like -- and .pred). ++ really means add one everywhere and can 
be kept as such, where .succ means the next, smallest possible item. 
This means that we can keep ++ and -- for all numeric types.


Coercing to Sequence from Range should by default use .succ on the LHS, 
whereas Seq could just use ++ semantics as often as desired. This would 
make Ranges completely consistent and provide a clear distinction 
between the two classes.

Getting back to 10..0


Yes, I agree with Jon that this should be an empty range. I don't care 
what order you pick the elements from an empty range :).

Either empty, the same as 0 .. 10 or throw an error (I like errors :).

Regards,

Michael.



Re: Suggested magic for a .. b

2010-07-28 Thread Michael Zedeler

On 2010-07-29 01:39, Jon Lang wrote:

Aaron Sherman wrote:


In smart-match context, a..b includes aardvark.


No one has yet explained to me why that makes sense. The continued use of
ASCII examples, of course, doesn't help. Does a .. b include æther?
This is where Germans and Swedes, for example, don't agree, but they're all
using the same Latin code blocks.


This is definitely something for the Unicode crowd to look into.  But
whatever solution you come up with, please make it compatible with the
notion that aardvark..apple can be used to match any word in the
dictionary that comes between those two words.


The key issue here is whethere there is a well defined and meaningful 
ordering of the characters in question. We keep discussing the nice 
examples, but how about apple .. ส้ม?


I don't know enough about Unicode to suggest how to solve this. All I 
can say is that my example above should never return a valid Range 
object unless there is a way I can specify my own ordering and I use it.



I've never accepted that the range between two strings of identical length
should include strings of another length. That seems maximally non-intuitive
(well, I suppose you could always return the last 100 words of Hamlet as an
iterable IO object if you really wanted to confuse people), and makes string
and integer ranges far too divergent.


This is why I dislike the notion of the range operator being used to
produce lists: the question of what values you'd get by iterating from
one string value to another is _very_ different from the question of
what string values qualify as being between the two.  The more you use
infix:..  to produce lists, the more likely you are to conflate lists
with ranges.


I second the above. Ranges are all about comparing things. $x ~~ $a .. 
$b means is $x between $a and $b?. The only broadly accepted 
comparison of strings is lexicographical comparison. To illustrate the 
point: wouldn't you find it odd if 2.01 wasn't in between 1.1 and 2.1? 
Really?


Regards,

Michael.



Re: Suggested magic for a .. b

2010-07-28 Thread Jon Lang
Michael Zedeler wrote:
 Jon Lang wrote:
 This is definitely something for the Unicode crowd to look into.  But
 whatever solution you come up with, please make it compatible with the
 notion that aardvark..apple can be used to match any word in the
 dictionary that comes between those two words.

 The key issue here is whether there is a well defined and meaningful
 ordering of the characters in question. We keep discussing the nice
 examples, but how about apple .. ส้ม?

All I'm saying is: don't throw out the baby with the bathwater.  Come
up with an interim solution that handles the nice examples intuitively
and the ugly examples poorly (or better, if you can manage that right
out of the gate); then revise the model to improve the handling of the
ugly examples as much as you can; but while you do so, make an effort
to keep the nice examples working.

 I don't know enough about Unicode to suggest how to solve this. All I can
 say is that my example above should never return a valid Range object unless
 there is a way I can specify my own ordering and I use it.

That actually says something: it says that we may want to reconsider
the notion that all string values can be sorted.  You're suggesting
the possibility that a cmp ส้ is, by default, undefined.

There are some significant problems that arise if you do this.

-- 
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-28 Thread Chris Fields
On Jul 28, 2010, at 1:37 PM, Mark J. Reed wrote:

 On Wed, Jul 28, 2010 at 2:30 PM, Chris Fields cjfie...@illinois.edu wrote:
 On Jul 28, 2010, at 1:27 PM, Mark J. Reed wrote:
 Can I get an Amen?  Amen!
 --
 Mark J. Reed markjr...@gmail.com
 
 +1.  I'm agnostic ;
 
 Militant?  :)  ( http://tinyurl.com/3xjgxnl )
 
 Nothing inherently religious about amen (or me), but I'll accept
 +1 as synonymous.   :)
 
 -- 
 Mark J. Reed markjr...@gmail.com

Not militant, just trying to inject a bit of humor into the zombie thread that 
won't die.

chris

Re: Suggested magic for a .. b

2010-07-28 Thread Chris Fields
On Jul 28, 2010, at 1:27 PM, Mark J. Reed wrote:

 On Wednesday, July 28, 2010, Jon Lang datawea...@gmail.com wrote:
 Keep it simple, folks!  There are enough corner cases in Perl 6 as
 things stand; we don't need to be introducing more of them if we can
 help it.
 
 Can I get an Amen?  Amen!
 -- 
 Mark J. Reed markjr...@gmail.com

+1.  I'm agnostic ;

chris


Re: Suggested magic for a .. b

2010-07-28 Thread Michael Zedeler

On 2010-07-29 02:19, Jon Lang wrote:

Michael Zedeler wrote:
   

Jon Lang wrote:
 

This is definitely something for the Unicode crowd to look into.  But
whatever solution you come up with, please make it compatible with the
notion that aardvark..apple can be used to match any word in the
dictionary that comes between those two words.
   

The key issue here is whether there is a well defined and meaningful
ordering of the characters in question. We keep discussing the nice
examples, but how about apple .. ส้ม?
 

All I'm saying is: don't throw out the baby with the bathwater.  Come
up with an interim solution that handles the nice examples intuitively
and the ugly examples poorly (or better, if you can manage that right
out of the gate); then revise the model to improve the handling of the
ugly examples as much as you can; but while you do so, make an effort
to keep the nice examples working.
   
I am sorry if what I write is understood as an argument against ranges 
of strings. I think I know too little about Unicode to be able to do 
anything but point at some issues, I belive we'll have to deal with. The 
solution is not obvious to me.

I don't know enough about Unicode to suggest how to solve this. All I can
say is that my example above should never return a valid Range object unless
there is a way I can specify my own ordering and I use it.
 

That actually says something: it says that we may want to reconsider
the notion that all string values can be sorted.  You're suggesting
the possibility that a cmp ส้ is, by default, undefined.
   

Yes, but I am sure its due to my lack of understanding of Unicode.

Regards,

Michael.



Re: Suggested magic for a .. b

2010-07-28 Thread Darren Duncan

Jon Lang wrote:

I don't know enough about Unicode to suggest how to solve this. All I can
say is that my example above should never return a valid Range object unless
there is a way I can specify my own ordering and I use it.


That actually says something: it says that we may want to reconsider
the notion that all string values can be sorted.  You're suggesting
the possibility that a cmp ส้ is, by default, undefined.


I think that a general solution here is to accept that there may be more than 
one valid way to sort some types, strings especially, and so operators/routines 
that do sorting should be customizable in some way so users can pick the 
behaviour they want.


The customization could be applied at various levels, such as using an extra 
argument or trait for the operator/function that cares about ordering, or by 
using an extra attribute or trait for the types being sorted.


In fact, this whole issue is very close in concept to the situations where you 
need to do equality/identity tests.


With strings, identity tests can change answers depending on whether you are 
doing it on language-dependent or language-independent graphemes, and Perl 6 
encodes that abstraction level as value metadata.


When you want to be consistent, the behaviour of cmp affects all of the other 
order-sensitive operations, including any working with intervals.


Some possible examples of customization:

  $foo ~~ $a..$b :QuuxNationality  # just affects this one test

  $bar = 'hello' :QuuxNationality  # applies anywhere the Str value is used

Also, declaring a Str subtype or something.

Of course, after all this, we still want some reasonable default.  I suggest 
that for Str that aren't nationality-specific, the default ordering semantics 
are by whatever generic ordering Unicode defines, which might be by codepoint. 
And then for Str with nationality-specific grapheme abstractions, the default 
sorting can be whatever is the case for that nationality.  And this is how it is 
except where users define some other order.


So then, a cmp ส้ is always defined, but users can change the definition.

-- Darren Duncan


Re: Suggested magic for a .. b

2010-07-28 Thread Brandon S Allbery KF8NH
 On 7/28/10 8:07 PM, Michael Zedeler wrote:
 On 2010-07-29 01:39, Jon Lang wrote:
 Aaron Sherman wrote:
 In smart-match context, a..b includes aardvark.
 No one has yet explained to me why that makes sense. The continued
 use of
 ASCII examples, of course, doesn't help. Does a .. b include
 æther?
 This is where Germans and Swedes, for example, don't agree, but
 they're all
 using the same Latin code blocks.
 This is definitely something for the Unicode crowd to look into.  But
 whatever solution you come up with, please make it compatible with the
 notion that aardvark..apple can be used to match any word in the
 dictionary that comes between those two words.
 The key issue here is whethere there is a well defined and meaningful
 ordering of the characters in question. We keep discussing the nice
 examples, but how about apple .. ส้ม?

I thought that was already disallowed by spec.


Re: Suggested magic for a .. b

2010-07-27 Thread Aaron Sherman
Sorry I haven't responded for so long... much going on in my world.

On Mon, Jul 26, 2010 at 11:35 AM, Nicholas Clark n...@ccl4.org wrote:

 On Tue, Jul 20, 2010 at 07:31:14PM -0400, Aaron Sherman wrote:

  2) We deny that a range whose LHS is larger than its RHS makes sense,
 but
  we also don't provide an easy way to construct such ranges lazily
 otherwise.
  This would be annoying only, but then we have declared that ranges are
 the
  right way to construct basic loops (e.g. for (1..1e10).reverse - $i
 {...}
  which is not lazy (blows up your machine) and feels awfully clunky next
 to
  for 1e10..1 - $i {...} which would not blow up your machine, or even
 make
  it break a sweat, if it worked)

 There is no reason why for (1..1e10).reverse - $i {...} should *not* be
 lazy.


As a special case, perhaps you can treat ranges as special and not as simple
iterators. To be honest, I wasn't thinking about the possibility of such
special cases, but about iterators in general. You can't generically reverse
lazy constructs without running afoul of the halting problem, which I invite
you to solve at your leisure ;-)

For example, let's just tie it to integer factorization to make it really
obvious:

 # Generator for ranges of sequential, composite integers
 sub composites(Int $start) { gather do { for $start .. * - $i {
   last if isprime($i);
   take $i;
 } } }
 for composites(10116471302318).reverse - $i { say $i }

The first value should be 10116471302380, but computing that without
iterating through the list from start to finish would require knowing that
none of the integers between 10116471302318 and 10116471302380, inclusive,
are prime. Of course, the same problem exists for any iterator where the end
condition or steps can't be easily pre-computed, but this makes it more
obvious than most.

That means that Range.reverse has to do something special that iterators in
general can't be relied on to do. Does that introduce problems? Not big
ones. I can definitely see people who are used to for ($a .. $b).reverse -
... getting confused when for @blah.reverse - ... blows up their
machine, but avoiding that confusion might not be practical.

PS: On a really abstract note, requiring that ($a .. $b).reverse be lazy
will put new constraints on the right hand side parameter. Previously, it
didn't have to have a value of its own, it just had to be comparable to
other values. for example:

  for $a .. $b - $c { ... }

In that, we don't include the RHS in the output range explicitly. Instead,
we increment a $a (via .succ) until it's = $b. If $a were 1 and $b were an
object that does Int but just implements the comparison features, and has
no fixed numeric value, then it should still work (e.g. it could be random).
Now that's not possible because we need to use the RHS a the starting point
when .reverse is invoked.

I have no idea if that matters, but it's important to be aware of when and
where we constrain the interface rather than discovering it later.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for a .. b

2010-07-27 Thread Jon Lang
Aaron Sherman wrote:
 As a special case, perhaps you can treat ranges as special and not as simple
 iterators. To be honest, I wasn't thinking about the possibility of such
 special cases, but about iterators in general. You can't generically reverse
 lazy constructs without running afoul of the halting problem, which I invite
 you to solve at your leisure ;-)

A really obvious example occurs when the RHS is a Whatever:

   (1..*).reverse;

.reverse magic isn't going to be generically applicable to all lazy
lists; but it can be applicable to all lazy lists that have predefined
start points, end points, and bidirectional iterators, and on all lazy
lists that have random-access iterators and some way of locating the
tail.  Sometimes you can guess what the endpoint and backward-iterator
should be from the start point and the forward-iterator, just as the
infix:... operator is able to guess what the forward-iterator should
be from the first one, two, or three items in the list.

This is especially a problem with regard to lists generated using the
series operator, as it's possible to define a custom forward-iterator
for it (but not, AFAICT, a custom reverse-iterator).  In comparison,
the simplicity of the range operator's list generation algorithm
almost guarantees that as long as you know for certain what or where
the last item is, you can lazily generate the list from its tail.  But
only almost:

   (1..3.5); # list context: 1, 2, 3
   (1..3.5).reverse; # list context: 3.5, 2.5, 1.5 - assuming list is
generated from tail.
   (1..3.5).reverse; # list context: 3, 2, 1 - but only if you
generate it from the head first, and then reverse it.

Again, the proper tool for list generation is the series operator,
because it can do everything that the range operator can do in terms
of list generation, and more.

1 ... 3.5 # same as 1, 2, 3
3.5 ... 1 # same as 3.5, 2.5, 1.5 - and obviously so.

With this in mind, I see no reason to allow any magic on .reverse when
dealing with the range operator (or the series operator, for that
matter): as far as it's concerned, it's dealing with a list that lacks
a reverse-iterator, and so it will _always_ generate the list from its
head to its tail before attempting to reverse it.  Maybe at some later
point, after we get Perl 6.0 out the door, we can look into revising
the series operator to permit more powerful iterators so as to allow
.reverse and the like to bring more dwimmy magic to bear.

-- 
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-27 Thread Michael Zedeler

On 2010-07-27 23:50, Aaron Sherman wrote:

PS: On a really abstract note, requiring that ($a .. $b).reverse be lazy
will put new constraints on the right hand side parameter. Previously, it
didn't have to have a value of its own, it just had to be comparable to
other values. for example:

   for $a .. $b -  $c { ... }

In that, we don't include the RHS in the output range explicitly. Instead,
we increment a $a (via .succ) until it's= $b. If $a were 1 and $b were an
object that does Int but just implements the comparison features, and has
no fixed numeric value, then it should still work (e.g. it could be random).
Now that's not possible because we need to use the RHS a the starting point
when .reverse is invoked.

This is exactly why I keep writing posts about Ranges being defunct as 
they have been specified now. If we accept the premise that Ranges are 
supposed to define a kind of linear membership specification between two 
starting points (as in math), it doesn't make sense that the LHS has an 
additional constraint (having to provide a .succ method). All we should 
require is that both endpoints supports comparison (that they share a 
common type with comparison, at least).


To provide expansion to lists, such as for $a .. $b - $c { ... }, we 
should use type coercion semantics, coercing from Range to Sequence and 
throw an error if the LHS doesn't support .succ.


Writing ($a .. $b).reverse doesn't make any sense if the result were a 
new Range, since Ranges should then only be used for inclusion tests (so 
swapping endpoints doesn't have any meaningful interpretation), but 
applying .reverse could result in a coercion to Sequence.


Writing for ($a .. $b).reverse - $c { ...} may then blow up because it 
turns out that $b doesn't have a .succ method when coercing to sequence 
(where the LHS must have an initial value), just like for $a .. $b - $c 
{ ... } should be able to blow up because the LHS of a Range shouldn't 
have to support .succ.


Regards,

Michael.



Re: Suggested magic for a .. b

2010-07-26 Thread Nicholas Clark
On Tue, Jul 20, 2010 at 07:31:14PM -0400, Aaron Sherman wrote:

 2) We deny that a range whose LHS is larger than its RHS makes sense, but
 we also don't provide an easy way to construct such ranges lazily otherwise.
 This would be annoying only, but then we have declared that ranges are the
 right way to construct basic loops (e.g. for (1..1e10).reverse - $i {...}
 which is not lazy (blows up your machine) and feels awfully clunky next to
 for 1e10..1 - $i {...} which would not blow up your machine, or even make
 it break a sweat, if it worked)

There is no reason why for (1..1e10).reverse - $i {...} should *not* be lazy.

After all, Perl 5 now implements

@b = reverse sort @a

by directly sorting in reverse. Note how it's now an ex-reverse:

$ perl -MO=Concise -e '@b = reverse sort @a'
c  @ leave[1 ref] vKP/REFC -(end)
1 0 enter -2
2 ; nextstate(main 1 -e:1) v -3
b 2 aassign[t6] vKS -c
-1 ex-list lK -8
3   0 pushmark s -4
-   1 ex-reverse lK/1 --
4  0 pushmark s -5
7  @ sort lK/REV -8
- 0 ex-pushmark s -5
6 1 rv2av[t4] lK/1 -7
5# gv[*a] s -6
-1 ex-list lK -b
8   0 pushmark s -9
a   1 rv2av[t2] lKRM*/1 -b
9  # gv[*b] s -a
-e syntax OK

Likewise

foreach (reverse @a) {...}

is implemented as a reverse iterator on the array, rather than a temporary
list:

$ perl -MO=Concise -e 'foreach(reverse @a) {}'
d  @ leave[1 ref] vKP/REFC -(end)
1 0 enter -2
2 ; nextstate(main 2 -e:1) v -3
c 2 leaveloop vK/2 -d
7{ enteriter(next-9 last-c redo-8) lKS/REVERSED -a
-   0 ex-pushmark s -3
-   1 ex-list lKM -6
3  0 pushmark s -4
-  1 ex-reverse lKM/1 -6
- 0 ex-pushmark s -4
5 1 rv2av[t2] sKR/1 -6
4# gv[*a] s -5
6   # gv[*_] s -7
-1 null vK/1 -c
b   | and(other-8) vK/1 -c
a  0 iter s/REVERSED -b
-  @ lineseq vK --
8 0 stub v -9
9 0 unstack v -a
-e syntax OK



If it's part of the specification that (1..1e10).reverse is to be implemented
lazily, I'd (personally) consider that an easy enough way to construct a lazy
range.


This doesn't answer any of your other questions about what ranges of
character strings should mean. I don't really have an opinion, other than
it needs to be simple enough to be teachable.

Nicholas Clark


Re: Suggested magic for a .. b

2010-07-21 Thread Smylers
Jon Lang writes:

 Approaching this with the notion firmly in mind that infix:.. is
 supposed to be used for matching ranges while infix:... should be
 used to generate series:
 
 With series, we want C $LHS ... $RHS  to generate a list of items
 starting with $LHS and ending with $RHS.  If $RHS  $LHS, we want it
 to increment one step at a time; if $RHS  $LHS, we want it to
 decrement one step at a time.

Do we? I'm used to generating lists and iterating over them (in Perl 5)
with things like like:

  for (1 .. $max)

where the intention is that if $max is zero, the loop doesn't execute at
all. Having the equivalent Perl 6 list generation operator, C...,
start counting backwards could be confusing.

Especially if Perl 6 also has a range operator, C.., which would Do
The Right Thing for me in this situation, and where the Perl 6 operator
that Does The Right Thing is spelt the same as the Perl 5 operator that
I'm used to; that muddles the distinction you make above about matching
ranges versus generating lists.

Smylers
-- 
http://twitter.com/Smylers2


Re: Suggested magic for a .. b

2010-07-21 Thread Jon Lang
Smylers wrote:
 Jon Lang writes:
 Approaching this with the notion firmly in mind that infix:.. is
 supposed to be used for matching ranges while infix:... should be
 used to generate series:

 With series, we want C $LHS ... $RHS  to generate a list of items
 starting with $LHS and ending with $RHS.  If $RHS  $LHS, we want it
 to increment one step at a time; if $RHS  $LHS, we want it to
 decrement one step at a time.

 Do we?

Yes, we do.

 I'm used to generating lists and iterating over them (in Perl 5)
 with things like like:

  for (1 .. $max)

 where the intention is that if $max is zero, the loop doesn't execute at
 all. Having the equivalent Perl 6 list generation operator, C...,
 start counting backwards could be confusing.

 Especially if Perl 6 also has a range operator, C.., which would Do
 The Right Thing for me in this situation, and where the Perl 6 operator
 that Does The Right Thing is spelt the same as the Perl 5 operator that
 I'm used to; that muddles the distinction you make above about matching
 ranges versus generating lists.

It does muddy the difference, which is why my own gut instinct would
have been to do away with infix:..'s ability to generate lists.
Fortunately, I'm not in charge here, and wiser heads than mine have
decreed that infix:.., when used in list context, will indeed
generate a list in a manner that closely resembles Perl 5's range
operator: start with the LHS, then increment until you equal or exceed
the RHS - and if you start out exceeding the RHS, you've got yourself
an empty list.

You can do the same thing with the infix:... operator, too; but
doing so will be bulkier (albeit much more intuitive).  For example,
the preferred Perl 6 approach to what you described would be:

for 1, 2 ... $x

The two-element list on the left of the series operator invokes a bit
of magic that tells it that the algorithm for generating the next step
in the series is to invoke the increment operator.  This is all
described in S03 in considerable detail; I suggest rereading the
section there concerning the series operator before passing judgment
on it.  .

--
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-21 Thread Mark J. Reed
Ok, I find that surprising (and counter to current Rakudo behavior),
but thanks for the correction, and sorry about the misinformation.

On Wednesday, July 21, 2010, Larry Wall la...@wall.org wrote:
 On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote:
 : In particular, consider that pi ~~ 0..4 is true,
 :  because pi is within the range; but pi ~~ 0...4 is false, because pi
 : is not one of the generated elements.

 Small point here, it's not because pi is fractional: 3 ~~ 0...4 is
 also false because 3 !eqv (0,1,2,3,4).  There is no implicit any()
 on a smartmatch list pattern as there is in Perl 5.  In Perl 6 the
 pattern 0..4 may only match a list with the same 5 elements in the
 same order.

 Larry


-- 
Mark J. Reed markjr...@gmail.com


Re: Suggested magic for a .. b

2010-07-21 Thread Mark J. Reed
Strike the counter to current Rakudo behavior bit; Rakudo is
behaving as specified in this instance.  I must have been
hallucinating.

On Wed, Jul 21, 2010 at 7:33 AM, Mark J. Reed markjr...@gmail.com wrote:
 Ok, I find that surprising (and counter to current Rakudo behavior),
 but thanks for the correction, and sorry about the misinformation.

 On Wednesday, July 21, 2010, Larry Wall la...@wall.org wrote:
 On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote:
 : In particular, consider that pi ~~ 0..4 is true,
 :  because pi is within the range; but pi ~~ 0...4 is false, because pi
 : is not one of the generated elements.

 Small point here, it's not because pi is fractional: 3 ~~ 0...4 is
 also false because 3 !eqv (0,1,2,3,4).  There is no implicit any()
 on a smartmatch list pattern as there is in Perl 5.  In Perl 6 the
 pattern 0..4 may only match a list with the same 5 elements in the
 same order.

 Larry


 --
 Mark J. Reed markjr...@gmail.com




-- 
Mark J. Reed markjr...@gmail.com


Re: Suggested magic for a .. b

2010-07-21 Thread Larry Wall
On Wed, Jul 21, 2010 at 09:23:11AM -0400, Mark J. Reed wrote:
: Strike the counter to current Rakudo behavior bit; Rakudo is
: behaving as specified in this instance.  I must have been
: hallucinating.

Well, except that we both neglected precedence.   Since ... is looser
than ~~, it must be written 3 ~~ (0...4).  :-)

Larry


Re: Suggested magic for a .. b

2010-07-21 Thread Aaron Sherman
On Wed, Jul 21, 2010 at 1:28 AM, Aaron Sherman a...@ajs.com wrote:


 For reference, this is the relevant section of the spec:

 Character positions are incremented within their natural range for any
 Unicode range that is deemed to represent the digits 0..9 or that is deemed
 to be a complete cyclical alphabet for (one case of) a (Unicode) script.
 Only scripts that represent their alphabet in codepoints that form a cycle
 independent of other alphabets may be so used. (This specification defers to
 the users of such a script for determining the proper cycle of letters.) We
 arbitrarily define the ASCII alphabet not to intersect with other scripts
 that make use of characters in that range, but alphabets that intersperse
 ASCII letters are not allowed.


 I'm not sure that all of that tracks with the Unicode standard's use of
 some of the terms, but based on what we've discussed, perhaps we could get
 more specific there:

 Character positions are incremented within their Unicode Script, but only
 in keeping with their General Category property. Thus CA++ yields CB
 which is the next codepoint, but CĂ++ yields CĄ even though ą
 falls between the two, when incrementing codepoints. Should this prove
 problematic for any specific Unicode Script which requires special handling
 (e.g. because a letter really isn't used as a letter at all), such special
 handling may be applied, but the above is the general rule.


Oh, so close! I realized that I broke the original spec, here. We need to
add back in:

There are two special cases: the ASCII-compatible lower-case letters (a-z)
and the ASCII-compatible upper-case letters (A-Z). For historical reasons,
these, by default, will not increment past the end of their ranges into the
higher-codepoint Latin characters.


Note: we might want a pragma for that as well. I'd suggest that perhaps it
should be a locale-specific feature? So, if you set your locale to fr, then
you include in those ranges all of the Latin characters used in French.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for a .. b

2010-07-21 Thread Darren Duncan

Larry Wall wrote:

On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote:
: In particular, consider that pi ~~ 0..4 is true,
:  because pi is within the range; but pi ~~ 0...4 is false, because pi
: is not one of the generated elements.

Small point here, it's not because pi is fractional: 3 ~~ 0...4 is
also false because 3 !eqv (0,1,2,3,4).  There is no implicit any()
on a smartmatch list pattern as there is in Perl 5.  In Perl 6 the
pattern 0..4 may only match a list with the same 5 elements in the
same order.


For some reason I thought smart match in Perl 6, when presented with some 
collection on the right-hand side, would test if the value on the left-hand side 
was contained in the collection.


So, for example:

  my @ary = (1,4,3,2,9);
  my $test = 3;
  $test ~~ @ary;  # TRUE

Similarly, since a range represents a set of all values between 2 endpoints, I 
might have thought this would be reasonable:


  3 ~~ 1..5  # TRUE

So if that doesn't work, then what is the canonical way to ask if a value is in 
a range?


Would any of these be reasonable?

  3 ~~ any(1..5)

  3 in 1..5

  3 ∈ 1..5  # Unicode alternative

-- Darren Duncan


Re: Suggested magic for a .. b

2010-07-21 Thread Mark J. Reed
On Wed, Jul 21, 2010 at 3:55 PM, Darren Duncan dar...@darrenduncan.net wrote:
 Larry Wall wrote:

 On Tue, Jul 20, 2010 at 11:53:27PM -0400, Mark J. Reed wrote:
 : In particular, consider that pi ~~ 0..4 is true,
 :  because pi is within the range; but pi ~~ 0...4 is false, because pi
 : is not one of the generated elements.

 Small point here, it's not because pi is fractional: 3 ~~ 0...4 is
 also false because 3 !eqv (0,1,2,3,4).  There is no implicit any()
 on a smartmatch list pattern as there is in Perl 5.  In Perl 6 the
 pattern 0..4 may only match a list with the same 5 elements in the
 same order.

 For some reason I thought smart match in Perl 6, when presented with some
 collection on the right-hand side, would test if the value on the left-hand
 side was contained in the collection.

That was my thought as well.

 Similarly, since a range represents a set of all values between 2 endpoints,
 I might have thought this would be reasonable:

  3 ~~ 1..5  # TRUE

AIUI, that is indeed correct.  Ranges smartmatch by testing for
inclusion in the range.  But collections don't smartmatch by testing
for inclusion in the collection.  Which was probably the subject of a
thread I missed somewhere...

For series, I think the canonical solution is to use any().

-- 
Mark J. Reed markjr...@gmail.com


Re: Suggested magic for a .. b

2010-07-20 Thread Solomon Foster
On Tue, Jul 20, 2010 at 7:31 PM, Aaron Sherman a...@ajs.com wrote:
 2) We deny that a range whose LHS is larger than its RHS makes sense, but
 we also don't provide an easy way to construct such ranges lazily otherwise.
 This would be annoying only, but then we have declared that ranges are the
 right way to construct basic loops (e.g. for (1..1e10).reverse - $i {...}
 which is not lazy (blows up your machine) and feels awfully clunky next to
 for 1e10..1 - $i {...} which would not blow up your machine, or even make
 it break a sweat, if it worked)

Ranges haven't been intended to be the right way to construct basic
loops for some time now.  That's what the ... series operator is
for.

for 1e10 ... 1 - $i {
 # whatever
}

is lazy by the spec, and in fact is lazy and fully functional in
Rakudo.  (Errr... okay, actually it just seg faulted after hitting
968746 in the countdown.  But that's a Rakudo bug unrelated to
this, I'm pretty sure.)

All the magic that one wants for handling loop indices -- going
backwards, skipping numbers, geometric series, and more -- is present
in the series operator.  Range is not supposed to do any of that stuff
other than the most basic forward sequence.

-- 
Solomon Foster: colo...@gmail.com
HarmonyWare, Inc: http://www.harmonyware.com


Re: Suggested magic for a .. b

2010-07-20 Thread Jon Lang
Solomon Foster wrote:
 Ranges haven't been intended to be the right way to construct basic
 loops for some time now.  That's what the ... series operator is
 for.

    for 1e10 ... 1 - $i {
         # whatever
    }

 is lazy by the spec, and in fact is lazy and fully functional in
 Rakudo.  (Errr... okay, actually it just seg faulted after hitting
 968746 in the countdown.  But that's a Rakudo bug unrelated to
 this, I'm pretty sure.)

You took the words out of my mouth.

 All the magic that one wants for handling loop indices -- going
 backwards, skipping numbers, geometric series, and more -- is present
 in the series operator.  Range is not supposed to do any of that stuff
 other than the most basic forward sequence.

Here, though, I'm not so sure: I'd like to see how many of Aaron's
issues remain unresolved once he reframes them in terms of the series
operator.

-- 
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-20 Thread Solomon Foster
On Tue, Jul 20, 2010 at 10:00 PM, Jon Lang datawea...@gmail.com wrote:
 Solomon Foster wrote:
 Ranges haven't been intended to be the right way to construct basic
 loops for some time now.  That's what the ... series operator is
 for.

    for 1e10 ... 1 - $i {
         # whatever
    }

 is lazy by the spec, and in fact is lazy and fully functional in
 Rakudo.  (Errr... okay, actually it just seg faulted after hitting
 968746 in the countdown.  But that's a Rakudo bug unrelated to
 this, I'm pretty sure.)

 You took the words out of my mouth.

 All the magic that one wants for handling loop indices -- going
 backwards, skipping numbers, geometric series, and more -- is present
 in the series operator.  Range is not supposed to do any of that stuff
 other than the most basic forward sequence.

 Here, though, I'm not so sure: I'd like to see how many of Aaron's
 issues remain unresolved once he reframes them in terms of the series
 operator.

Sorry, didn't mean to imply the series operator was perfect.  (Though
it is surprisingly awesome in  general, IMO.)  Just that the right
questions would be about the series operator rather than Ranges.

The questions definitely look different that way: for example,
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz is easily and
clearly expressed as

'A' ... 'Z', 'a' ... 'z' # don't think this works in Rakudo yet  :(

That suggests to me that the current behavior of 'A' ... 'z' is pretty
reasonable.

-- 
Solomon Foster: colo...@gmail.com
HarmonyWare, Inc: http://www.harmonyware.com


Re: Suggested magic for a .. b

2010-07-20 Thread Aaron Sherman
Side note: you could get around some of the problems, below, but in order to
do so, you would have to exhaustively express all of Unicode using the Str
builtin module's RANGES constant. In fact, as it is now, it defines ASCII
lowercase, but doesn't define Latin lowercase. Presumably because doing so
would be a massive pain. Again, I'll point out that using script and
properties is much easier

On Tue, Jul 20, 2010 at 10:35 PM, Solomon Foster colo...@gmail.com wrote:


 Sorry, didn't mean to imply the series operator was perfect.  (Though
 it is surprisingly awesome in  general, IMO.)  Just that the right
 questions would be about the series operator rather than Ranges.


So, what's the intention of the range operator, then? Is it just there to
offer backward compatibility with Perl 5? Is it a vestige that should be
removed so that we can Huffman ... down to ..?

I'm not trying to be difficult, here, I just never knew that ... could
operate on a single item as LHS, and if it can, then .. seems to be obsolete
and holding some prime operator real estate.



 The questions definitely look different that way: for example,
 ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz is easily and
 clearly expressed as

'A' ... 'Z', 'a' ... 'z' # don't think this works in Rakudo yet  :(


I still contend that this is so frequently desirable that it should have a
simpler form, but it's still going to have problems.

One example: for expressing Katakana letters (I use letters in the
Unicode sense, here) it's still dicey. There are things interspersed in the
Unicode sequence for Katakana that aren't the same thing at all. Unicode
calls them lowercase, but that's not quite right. They're smaller versions
of Katakana characters which are used more as punctuation or accents than as
syllabic glyphs the way the rest of Katakana is.

I guess you could write:

  ア, イ, ウ, エ, オ, カ ... ヂ,ツ ...モ,ヤ, ユ, ヨ ... ロ, ワ ... ヴ (add quotes to taste)

But that seems quite a bit more painful than:

 ア .. ヴ (or ... if you prefer)

Similar problems exist for many scripts (including some of Latin, we're just
used to the parts that are odd), though I think it's possible that Katakana
may be the worst because of the mis-use of Ll to indicate a letter when the
truth of the matter is far more complicated.



 That suggests to me that the current behavior of 'A' ... 'z' is pretty
 reasonable.


You still have to decide to make at least some allowances for invalid
codepoints and I think you should avoid ever generating a combining or
modifying codepoint in such a sequence (e.g. Ѻ ... Ҋ in Cyrillic which
contains several combining characters for currency and counting as well as
one undefined codepoint).

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for a .. b

2010-07-20 Thread Jon Lang
Approaching this with the notion firmly in mind that infix:.. is
supposed to be used for matching ranges while infix:... should be
used to generate series:

Aaron Sherman wrote:
 Walk with me a bit, and let's explore the concept of intuitive character
 ranges? This was my suggestion, which seems pretty basic to me:

 x .. y, for all strings x and y, which are composed of a single, valid
 codepoint which is neither combining nor modifying, yields the range of all
 valid, non-combining/modifying codepoints between x and y, inclusive which
 share the Unicode script, general category major property and general
 category minor property of either x or y (lack of a minor property is a
 valid value).

This is indeed true for both range-matching and series-generation as
the spec is currently written.

 In general we have four problems with current specification and
 implementation on the Perl 6 and Perl 5 sides:

 1) Perl 5 and Rakudo have a fundamental difference of opinion about what
 some ranges produce (A .. z, X .. T, etc) and yet we've never really
 articulated why we want that.

 2) We deny that a range whose LHS is larger than its RHS makes sense, but
 we also don't provide an easy way to construct such ranges lazily otherwise.
 This would be annoying only, but then we have declared that ranges are the
 right way to construct basic loops (e.g. for (1..1e10).reverse - $i {...}
 which is not lazy (blows up your machine) and feels awfully clunky next to
 for 1e10..1 - $i {...} which would not blow up your machine, or even make
 it break a sweat, if it worked)

With ranges, we want C when $LHS .. $RHS  to always mean C if
$LHS = $_ = $RHS .  If $RHS  $LHS, then the range being specified
is not valid.  In this context, it makes perfect sense to me why it
doesn't generate anything.

With series, we want C $LHS ... $RHS  to generate a list of items
starting with $LHS and ending with $RHS.  If $RHS  $LHS, we want it
to increment one step at a time; if $RHS  $LHS, we want it to
decrement one step at a time.

So: 1) we want different behavior from the Range operator in Perl 6
vs. Perl 5 because we have completely re-envisioned the range
operator.  What we have replaced it with is fundamentally more
flexible, though not necessarily perfect.

 3) We've never had a clear-cut goal in allowing string ranges (as opposed to
 character ranges, which Perl 5 and 6 both muddy a bit), so intuitive
 becomes sketchy at best past the first grapheme, and ever muddier when only
 considering codepoints (thus that wing of my proposal and current behavior
 are on much shakier ground, except in so far as it asserts that we might
 want to think about it more).

I think that one notion that we're dealing with here is the idea that
C $X  $X.succ  for all strings.  This seems to be a rather
intuitive assumption to make; but it is apparently not an assumption
that Stringy.succ makes.  As I understand it, Z.succ eqv AA.  What
benefit do we gain from this behavior?  Is it the idea that eventually
this will iterate over every possible combination of capital letters?
If so, why is that a desirable goal?


My own gut instinct would be to define the string iterator such that
it increments the final letter in the string until it gets to Z;
then it resets that character to A and increments the next character
by one:

ABE, ABF, ABG ... ABZ, ACA, ACB ... ZZZ

This pattern ensures that for any two strings in the series, the first
one will be less than its successor.  It does not ensure that every
possible string between ABE and ZZZ will be represented; far from
it.  But then, 1...9 doesn't produce every number between 1 and 9; it
only produces integers.  Taken to an extreme: pi falls between 1 and
9; but no one in his right mind expects us to come up with a general
sequencing of numbers that increments from 1 to 9 with a guarantee
that it will hit pi before reaching 9.

Mind you, I know that the above is full of holes.  In particular, it
works well when you limit yourself to strings composed of capital
letters; do anything fancier than that, and it falls on its face.

 4) Many ranges involving single characters on LHS and RHS result in null
 or infinite output, which is deeply non-intuitive to me, and I expect many
 others.

Again, the distinction between range-matching and series-generation
comes to the rescue.

 Solve those (and I tried in my suggestion) and I think you will be able to
 apply intuition to character ranges, but only in so far as a human being is
 likely to be able to intuit anything related to Unicode.

Of the points that you raise, #1, 2, and 4 are neatly solved already.
I'm unsure as to #3; so I'd recommend focusing some scrutiny on it.

 The current behaviour of the range operator is (if I recall correctly):
 1) if both sides are single characters, make a range by incrementing
 codepoints


 Sadly, you can't do that reasonably. Here are some examples of why, using
 only Latin and Greek as examples (not the most convoluted Unicode 

Re: Suggested magic for a .. b

2010-07-20 Thread Jon Lang
Aaron Sherman wrote:
 So, what's the intention of the range operator, then? Is it just there to
 offer backward compatibility with Perl 5? Is it a vestige that should be
 removed so that we can Huffman ... down to ..?

 I'm not trying to be difficult, here, I just never knew that ... could
 operate on a single item as LHS, and if it can, then .. seems to be obsolete
 and holding some prime operator real estate.

On the contrary: it is not a vestige, it is not obsolete, and it's
making good use of the prime operator real estate that it's holding.
It's just not doing what it did in Perl 5.

I strongly recommend that you reread S03 to find out exactly what each
of these operators does these days.

 The questions definitely look different that way: for example,
 ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz is easily and
 clearly expressed as

    'A' ... 'Z', 'a' ... 'z'     # don't think this works in Rakudo yet  :(


 I still contend that this is so frequently desirable that it should have a
 simpler form, but it's still going to have problems.

 One example: for expressing Katakana letters (I use letters in the
 Unicode sense, here) it's still dicey. There are things interspersed in the
 Unicode sequence for Katakana that aren't the same thing at all. Unicode
 calls them lowercase, but that's not quite right. They're smaller versions
 of Katakana characters which are used more as punctuation or accents than as
 syllabic glyphs the way the rest of Katakana is.

 I guess you could write:

  ア, イ, ウ, エ, オ, カ ... ヂ,ツ ...モ,ヤ, ユ, ヨ ... ロ, ワ ... ヴ (add quotes to taste)

 But that seems quite a bit more painful than:

  ア .. ヴ (or ... if you prefer)

 Similar problems exist for many scripts (including some of Latin, we're just
 used to the parts that are odd), though I think it's possible that Katakana
 may be the worst because of the mis-use of Ll to indicate a letter when the
 truth of the matter is far more complicated.

Some of this might be addressed by filtering the list as you go -
though I don't remember the method for doing so.  Something like
.grep, I think, with a regex in it that only accepts letters:

(ア ... ヴ).«grep(/:alpha:/)

...or something to that effect.

Still, it's possible that we might need something that's more flexible
than that.

-- 
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-20 Thread Jon Lang
Mark J. Reed wrote:
 Perhaps the syllabic kana could be the integer analogs, and what you
 get when you iterate over the range using ..., while the modifier kana
 would not be generated by the series  ア ... ヴ but would be considered
 in the range  ア .. ヴ?  I wouldn't object to such script-specific
 behavior, though perhaps it doesn't belong in core.

As I understand it, it wouldn't need to be script-specific behavior;
just behavior that's aware of Unicode properties.  That particular
issue doesn't come up with the English alphabet because there aren't
any modifier codepoints embedded in the middle of the standard
alphabet.  And if there were, I'd hope that they'd be filtered out
from the series generation by default.

And I'd hope that there would be a way to turn the default filtering
off when I don't want it.

-- 
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-20 Thread Mark J. Reed
On Wed, Jul 21, 2010 at 12:04 AM, Jon Lang datawea...@gmail.com wrote:
 Mark J. Reed wrote:
 Perhaps the syllabic kana could be the integer analogs, and what you
 get when you iterate over the range using ..., while the modifier kana
 would not be generated by the series  ア ... ヴ but would be considered
 in the range  ア .. ヴ?  I wouldn't object to such script-specific
 behavior, though perhaps it doesn't belong in core.

 As I understand it, it wouldn't need to be script-specific behavior;
 just behavior that's aware of Unicode properties.

That wouldn't help in this case.  For example, U+30A1 KATAKANA SMALL
LETTER A - the small modifier variety of letter under discussion -
is not a modifier in the Unicode sense.  It has exactly the same
properties as U+30A2 KATAKANA LETTER A, an actual syllable:

30A1;KATAKANA LETTER SMALL A;Lo;0;L;N;
30A2;KATAKANA LETTER A;Lo;0;L;N;

So without script-specific special-case code, there's no way to
distinguish them.  As Aaron said, they're treated like lowercase, but
that's not an accurate representation of how they're used in actual
text, or of the common idea of what constitutes the set of kana.

-- 
Mark J. Reed markjr...@gmail.com


Re: Suggested magic for a .. b

2010-07-20 Thread Aaron Sherman
OK, there's a lot here and my head is swimming, so let me re-consolidate and
re-state (BTW: thanks Jon, you've really helped me understand, here).

1) The spec is somewhat vague, but the proposal that I made for single
characters is not an unreasonable interpretation of what's there. Thus, we
could adopt the script/major cat/minor cat triplet as the core tool that
.succ will use for single, non-combining, non-modifying, valid characters?

2) The spec doesn't put this information anywhere near the definition of the
range operator. Perhaps we can make a note? This was a source of confusion
for me.

3) It seems that there are two competing multi-character approaches and both
seem somewhat valid. Should we use a pragma to toggle behavior between A and
B:

 A: aa .. bb contains az
 B: aa .. bb contains ONLY aa, ab, ba and bb

4) About the ranges I gave as examples, you asked:

Which codepoint is invalid, and why?

There's just an undefined codepoint smack in the middle of the Greek
uppercase letters (U+03A2). I'm sure the Unicode specs have a rationale for
that somewhere, but my guess is that there's some thousand-year-old debate
about the Greek alphabet behind it.

In both of these cases, what do you think it should produce?

I actually gave that answer a bit later on. I think that Ā .. Ē should
produce ĀĂĄĆĈĊČĎĐĒ and オ .. ヺ should produce
オカガキギクグケゲコゴサザシジスズセゼソゾタダチヂツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモヤユヨラリルレロワヰヱヲンヴヷヸヹヺ
which are all of the Katakana syllabic characters.

I also have to wonder how or if 0 ... z ought to be resolved.  If
you're thinking in terms of the alphabet or digits, this is
nonsensical

Well, since you agreed with my statement about the properties checking, it
would be 0 through 9 and then a through z because 0 through 9 are Latin
numbers, matching the LHS's properties and a through z are lowercase Latin
letters, matching the RHS's properties.

For reference, this is the relevant section of the spec:

Character positions are incremented within their natural range for any
Unicode range that is deemed to represent the digits 0..9 or that is deemed
to be a complete cyclical alphabet for (one case of) a (Unicode) script.
Only scripts that represent their alphabet in codepoints that form a cycle
independent of other alphabets may be so used. (This specification defers to
the users of such a script for determining the proper cycle of letters.) We
arbitrarily define the ASCII alphabet not to intersect with other scripts
that make use of characters in that range, but alphabets that intersperse
ASCII letters are not allowed.


I'm not sure that all of that tracks with the Unicode standard's use of some
of the terms, but based on what we've discussed, perhaps we could get more
specific there:

Character positions are incremented within their Unicode Script, but only in
keeping with their General Category property. Thus CA++ yields CB
which is the next codepoint, but CĂ++ yields CĄ even though ą
falls between the two, when incrementing codepoints. Should this prove
problematic for any specific Unicode Script which requires special handling
(e.g. because a letter really isn't used as a letter at all), such special
handling may be applied, but the above is the general rule.


and then in the section on ranges:

As discussed previously, incrementing a character (which is to say, invoking
C.succ) seeks the next codepoint with the same Unicode Script and General
Category properties (major and minor category to be specific). For ranges,
succession is the same if .min and .max have the same properties, but if
they do not, then all codepoints are considered which are greater than
C.min and smaller than C.max and which agree with either the properties
of C.min Ior the properties of C.max


Re: Suggested magic for a .. b

2010-07-20 Thread Darren Duncan

Aaron Sherman wrote:

2) The spec doesn't put this information anywhere near the definition of the
range operator. Perhaps we can make a note? This was a source of confusion
for me.


My impression is that a Range primarily defines an interval in terms of 2 
endpoint values such that it defines a possibly infinite set values between 
those endpoints.


For example, 'aa'..'bb' is an infinite sized set that includes every possible 
character string that starts with the letter 'a', plus every one that starts 
with the string 'ba'.  And so, asking $anysuchstring ~~ 'aa'..'bb' is TRUE.


(Note that for .. to work, its 2 arguments would need to be of the same type, 
so that we know which set of rules to follow.  Or to be specific, the generic 
eqv operator, or before etc would have to be defined that takes both of the 
.. arguments as its arguments.  Although this might be fuzzed a bit if the 
spec defines somewhere about automatic casting.  For example, if someone said 
'foo'..42 then I would expect that to fail.)


A Range can also be used in a limited fashion to generate a finite list of 
values, but that is not its primary purpose, and the ... operator does that 
job much better.



3) It seems that there are two competing multi-character approaches and both
seem somewhat valid. Should we use a pragma to toggle behavior between A and
B:

 A: aa .. bb contains az
 B: aa .. bb contains ONLY aa, ab, ba and bb


I would find A to be the only reasonable answer.

If you want B's semantics then use ... instead; .. should not be overloaded 
for that.


If there were to be any similar pragma, then it should control matters like 
collation, or what nationality/etc-specific subtype of Str the 'aa' and 'bb' 
are blessed into on definition, so that their collation/sorting/etc rules can be 
applied when figuring out if a particular $foo~~$bar..$baz is TRUE or not.


-- Darren Duncan


Re: Suggested magic for a .. b

2010-07-20 Thread Darren Duncan

Darren Duncan wrote:
specific, the generic eqv operator, or before etc would have to be 


Correction, I meant to say cmp, not eqv, here. -- Darren Duncan


Re: Suggested magic for a .. b

2010-07-18 Thread Moritz Lenz
Ruud H.G. van Tol wrote:
 Aaron Sherman wrote:
 
 Having established this range for each correspondingly indexed letter, the
 range for multi-character strings is defined by a left-significant counting
 sequence. For example:
 
 Ab .. Be
 
 defines the ranges:
 
 A B and b c d e
 
 This results in a counting sequence (with the most significant character on
 the left) as follows:
 
 Ab Ac Ad Ae Bb Bc Bd Be
 
 glob can do that:
 
 perl5.8.5 -wle 'print for {A,B}{c,d,e}'

Or Perl 6, for that matter :-)

 .say for A B X~ ('a' .. 'e')
Aa
Ab
Ac
Ad
Ae
Ba
Bb
Bc
Bd
Be

In general, stuffing more complex behaviour into something that feels
unintuitive is rarely (if ever) a good solution.

The current behaviour of the range operator is (if I recall correctly):

1) if both sides are single characters, make a range by incrementing
codepoints
2) otherwise, call .succ on the LHS. Stop before the generated values
exceed the RHS.

I'm not convinced it should be any more complicated than that. Remember
that with the series operator you can easily define your own
incrementation rules, and meta operators (like the cross meta operator
demonstrated above) it's also easy to combine different series and lists.

Cheers,
Moritz


Re: Suggested magic for a .. b

2010-07-17 Thread Aaron Sherman
On Fri, Jul 16, 2010 at 9:40 PM, Michael Zedeler mich...@zedeler.dk wrote:


 What started it all, was the intention to extend the operator, making it
 possible to evaluate it in list context. Doing so has opened pandoras box,
 because many (most? all?) solutions are inconsistent with the rule of least
 surprise.


I don't think there's any coherent expectation, and therefore no potential
to avoid surprise. Returning comic books might be more of a surprise, but as
long as you're returning a string which appears to be in the range
expressed, then I don't see surprise as the problem.



 For instance, when considering strings, writing up an expression like

 'goat' ~~ 'cow' .. 'zebra'

 This makes sense in most cases, because goat is lexicographically between
 cow and zebra.


This presumes that we're treating a string as a number in base x (where x,
I guess would be the number of code points which share ... what, any of the
general category properties of the components of the input strings?

That begins to get horrendously messy very, very fast:

 say 1aB .. aB1



 I'd suggest that if you want to evaluate a Range in list context, you may
 have to provide a hint to the Range generator, telling it how to generate
 subsequent values. Your suggestion that the expansion of 'Ab' ..  'Be'
 should yield Ab Ac Ad Ae Bb Bc Bd Be is just an example of a different
 generator (you could call it a different implementation of ++ on Str types).
 It does look useful, but by realizing that it probably is, we have two
 candidates for how Ranges should evaluate in list context.


I think the solution here is to evaluate what's practical in the general
case. Your examples are, I think misleading because they involve English
words and we naturally leap to sure, that one's in the dictionary between
the other two. However, let me pose this dictionary lookup for you:

 cliché ~~ aphorism .. truth

Now, you see where this is going? What happens when we throw in some
punctuation?

 father-in-law ~~ dad .. stranger

The problem is that you have a complex heuristic in mind for determining
membership, and a very simple operator for expressing the set. Worse, I
haven't even gotten into dealing with Unicode where it's entirely reasonable
to write TOPIXコンポジット1500構成銘柄 which I shamelessly grabbed from a Tokyo
Stock Exchange page. That one string, used in everyday text, contains Latin
letters, Hiragana, Katakana, Han or Kanji idiograms and Latin digits.

Meanwhile, back to .. ... the range operator. The most useful application
that I can think of for strings of length  1 is for generating unique
strings such as for mktemp.

Beyond that, its application is actually quite limited, because the rules
for any other sort of string that might make sense to a human are absurdly
complex.

As such, I think it suffices to say that, for the most part, .. makes
sense for single-character strings, and to expand from there, rather than
trying to introduce anything more complex.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: Suggested magic for a .. b

2010-07-17 Thread Ruud H.G. van Tol

Aaron Sherman wrote:


Having established this range for each correspondingly indexed letter, the
range for multi-character strings is defined by a left-significant counting
sequence. For example:

Ab .. Be

defines the ranges:

A B and b c d e

This results in a counting sequence (with the most significant character on
the left) as follows:

Ab Ac Ad Ae Bb Bc Bd Be


glob can do that:

perl5.8.5 -wle 'print for {A,B}{c,d,e}'
Ac
Ad
Ae
Bc
Bd
Be



Currently, Rakudo produces this:

Ab, Ac, Ad, Ae, Af, Ag, Ah, Ai, Aj, Ak, Al, Am,
An, Ao, Ap, Aq, Ar, As, At, Au, Av, Aw, Ax, Ay,
Az, Ba, Bb, Bc, Bd, Be

which I don't think is terribly useful.


Good enough for me. For your variant, just override the .. for 'smarter' 
behavior?


--
Ruud



Re: Suggested magic for a .. b

2010-07-16 Thread yary
On Fri, Jul 16, 2010 at 9:40 AM, Aaron Sherman a...@ajs.com wrote:
 For example:

 Ab .. Be

 defines the ranges:

 A B and b c d e

 This results in a counting sequence (with the most significant character on
 the left) as follows:

 Ab Ac Ad Ae Bb Bc Bd Be

 Currently, Rakudo produces this:

 Ab, Ac, Ad, Ae, Af, Ag, Ah, Ai, Aj, Ak, Al, Am,
 An, Ao, Ap, Aq, Ar, As, At, Au, Av, Aw, Ax, Ay,
 Az, Ba, Bb, Bc, Bd, Be

There is one case where Rakudo's current output makes more sense then
your proposal, and that's when the sequence is analogous to a range of
numbers in another base, and you don't want to start at the equivalent
of '' or end up at the equivalent of ''. But that's a less
usual case and there's a workaround. Using your method  example, Ab
.. Az, Ba .. Be would reproduce what Rakudo does now.

In general, I like it. Though it does mean that the sequence generated
incrementing Ab repeatedly will diverge from Ab .. Be after 4
iterations.

-y


Re: Suggested magic for a .. b

2010-07-16 Thread Carl Mäsak
Aaron ():
 [...]

 Many useful results from this suggested change:

 C .. A = C B A (Rakudo: )

Regardless of the other traits of your proposed semantics, I think
permitting reversed ranges such as the one above would be a mistake.

Rakudo gives the empty list for ranges whose lhs exceeds (fsvo
exceeds) its rhs, because that's the way ranges work in Perl. The
reason ranges work that way in Perl (in my understanding) is that it's
the less surprising behavior when the endpoints are determined at
runtime.

For explicitly specifying a reverse list of characters, there's still
`reverse A .. C`, which is not only a straightforward idiom and
huffmanized about right, but also good documentation for the reader.

// Carl


Re: Suggested magic for a .. b

2010-07-16 Thread Michael Zedeler

On 2010-07-16 18:40, Aaron Sherman wrote:

Oh bother, I wrote this up last night, but forgot to send it. Here y'all go:

I've been testing .. recently, and it seems, in Rakudo, to behave like
Perl 5. That is, the magic auto-increment for a .. z works very wonkily,
given any range that isn't within some very strict definitions (identical
Unicode general category, increasing, etc.) So the following:

A .. z

produces very odd results.

I'd like to suggest that we re-define this operator on strings as follows:

[cut]

Ab .. Be

defines the ranges:

A B  andb c d e

This results in a counting sequence (with the most significant character on
the left) as follows:

Ab Ac Ad Ae Bb Bc Bd Be

Currently, Rakudo produces this:

Ab, Ac, Ad, Ae, Af, Ag, Ah, Ai, Aj, Ak, Al, Am,
An, Ao, Ap, Aq, Ar, As, At, Au, Av, Aw, Ax, Ay,
Az, Ba, Bb, Bc, Bd, Be

which I don't think is terribly useful.
   
I have been discussing the Range operator before on this list, and since 
it often becomes the topic of discussion, something must be wrong with it.


What started it all, was the intention to extend the operator, making it 
possible to evaluate it in list context. Doing so has opened pandoras 
box, because many (most? all?) solutions are inconsistent with the rule 
of least surprise.


For instance, when considering strings, writing up an expression like

'goat' ~~ 'cow' .. 'zebra'

This makes sense in most cases, because goat is lexicographically 
between cow and zebra. So we have a nice ordering of strings that even 
extends to strings of any length (note that the three words used in my 
example are 3, 4 and 5 letters). As you can see, we even have a Range 
operator in there, so everything should be fine. What breaks everything 
is that we expect the Range operator to be able to generate all values 
between the two provided endpoints. Everything goes downhill from there.


With regard to strings, lexicographical ordering is the only prevailing 
ordering we provide the developer with (apart from length which doesn't 
provide a strict ordering that is needed). So anyone using the Range 
operator would assume that when lexicographical ordering is used for 
Range membership test, it is also used for generation of its members, 
naturally leading to the infinite sequence


cow
cowa
cowaa
cowaaa
...
cowb
cowba
cowbaa

For some reason (even though Perl6 supports infinite lists) we are 
currently using a completely new construct: the domain of strings 
limited to the lenght of the longest operand. This is counter intuitive 
since


'cowbell' ~~ 'cow' .. 'zebra'

but

'cow' .. 'zebra'

does not produce 'cowbell' in list context.

Same story applies to other types that come with a natural ordering, but 
have an over countable domain. Although the solutions differ, the main 
problem is the same - they all behave counter intuitive.


5.0001 ~~ 1.1 .. 10.1

but

1.1 .. 10.1

does not (and really shouldn't!) produce 5.0001 in list context.

I'd suggest that if you want to evaluate a Range in list context, you 
may have to provide a hint to the Range generator, telling it how to 
generate subsequent values. Your suggestion that the expansion of 'Ab' 
..  'Be' should yield Ab Ac Ad Ae Bb Bc Bd Be is just an example of a 
different generator (you could call it a different implementation of ++ 
on Str types). It does look useful, but by realizing that it probably 
is, we have two candidates for how Ranges should evaluate in list context.


The same applies to Numeric types.

My suggestion is to eliminate the succ method on Rat, Complex, Real and 
Str and point people in the direction of the series operator if they 
need to generate sequences of things that are over countable.


Regards,

Michael.



Re: Suggested magic for a .. b

2010-07-16 Thread Jon Lang
Aaron Sherman wrote:
 Oh bother, I wrote this up last night, but forgot to send it. Here y'all
 go:

 I've been testing .. recently, and it seems, in Rakudo, to behave like
 Perl 5. That is, the magic auto-increment for a .. z works very
 wonkily,
 given any range that isn't within some very strict definitions (identical
 Unicode general category, increasing, etc.) So the following:

 A .. z

 produces very odd results.

Bear in mind that .. is no longer supposed to be used to generate
lists; for that, you should use   That said, that doesn't
address the issues you're raising; it merely spreads them out over two
operators (.. when doing pattern matching, and ... when doing list
generation).

Your restrictions and algorithms are a good start, IMHO; and at some
point when I have the time, energy, and know-how, I'll read through
them in detail and comment on them.  In the meantime, though, let me
point out a fairly obvious point: sometimes, I want my pattern
matching and list generation to be case-sensitive; other times, I
don't.  More generally, whatever algorithm you decide on should be
subject to tweaking by the user to more accurately reflect his
desires.  So perhaps .. and ... should have an adverb that lets
you switch case sensitivity on (if the default is off) or off (if
the default is on).  And if you do this, there should be function
forms of .. and ... for those of us who have trouble working with
the rules for applying adverbs to operators.  Likewise with other
situations where there might be more than one way to approach things.

-- 
Jonathan Dataweaver Lang


Re: Suggested magic for a .. b

2010-07-16 Thread Aaron Sherman
On Fri, Jul 16, 2010 at 1:14 PM, yary not@gmail.com wrote:

 There is one case where Rakudo's current output makes more sense then

your proposal, and that's when the sequence is analogous to a range of
 numbers in another base, and you don't want to start at the equivalent
 of '' or end up at the equivalent of ''.


If you want a range of numbers, you should be using numbers. Perl should
absolutely not try to guess that you want codepoints to appear in your
result set which were not either expressed in the input or fall between the
range of any two corresponding input codepoints.


 But that's a less
 usual case and there's a workaround. Using your method  example, Ab
 .. Az, Ba .. Be would reproduce what Rakudo does now.


Quite true.



 In general, I like it. Though it does mean that the sequence generated
 incrementing Ab repeatedly will diverge from Ab .. Be after 4
 iterations.


Also true, and I think that's a correct thing.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs