Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
Jon Lang wrote: Darren Duncan wrote: This said, I specifically think that a simple pair of curly braces is the best way to mark a Set. {1,2,3} # a Set of those 3 elements ... and this is also how it is done in maths I believe (and in Muldis D). In fact, I strongly support this assuming that all disambiguation eg with hashes can be specified. That would be great. Glad you agree. snip Sets built from multi-dimensional arrays migt be a problem: {1, 2, 3: 4, 5, 6} Does that even work? I thought the colon, or is it a semicolon, only had that meaning in a delimited list like () or []. In any event, I don't believe there is such a thing as a multi-dimensional set in that way. Unless you have a concept of multi-dimensional Hash keys, and then there might be an analogy. snip As for bags, well I think that is where we could get fancier. But *no* doubling up, as we don't want to interfere with nesting. Instead, it is common in maths to associate a + with set syntax to refer to bags instead. So, does Perl already ascribe a meaning to putting a + with various bracketing characters, such as this: +{1,2,2,5} # a Bag of 4 elements with 2 duplicates +{} # an empty Bag, unless that already means something So would the above try to cast the collection as a number, or take the count of its elements, or can we use something like that? I'd expect +{...} to count the elements. Something else I just thought of, and my main reason for writing this reply, is other options. Firstly, and I don't necessarily like this option, maybe we could use the simple curly-brace pair to mean something more general that can be treated as either a Set or a Bag depending on context. At least from my brief look around, it appears that maths use the same {foo, bar, baz} syntax to denote both sets and bags. In some ways it would be like how Perl has the generic (foo, bar, baz) syntax, which remembers order but isn't an Array. We certainly can't use the presence of duplicates in the {...} to pick Set vs Bag because there could legitimately be duplicates or not duplicates in the literals for both, especially if any of the list items are variables and we won't know until runtime whether any duplicate each other or not. I still think the better option is to have slightly different looking syntax for the two. I still prefer Set being the plain brace pair and a Bag being that plus something extra. It seems that a leading + or ~ or ? is out because those have established meanings as treating what they're next to in num/str/bool context, so something else. But it really should be a leading symbolic. The differentiator needs to be be leading, not trailing; end-weight is bad. I think that having the marker character /inside/ the curly braces actually gives us more choices and would cut down on syntactic conflicts, because then we can basically pick anything that isn't a symbolic prefix unary. Barring a better suggestion, I suggest the greater-than symbol. So: {1,2,3,3,4} # 4-element Set {1,2,3,3,4} # 5-element Bag I think that looks different than anything else we have, and the greater-than could be a mnemonic that there is more in here. Moreover, the different appearance means we could use = to indicate a count of that element's contribution to its count, {1,2,3=2,4}, without there being a confusion with a Hash. That said, I like the + most when differentiating a Bag from a Set, but we have that symbolic unary + which could interfere with it. -- Darren Duncan
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/7/10 23:19 , Jon Lang wrote: 1 -- 2 -- 3 Would be a Bag containing three elements: 1, 2, and 3. Personally, I wouldn't put a high priority on this; for my purposes, Bag(1, 2, 3) works just fine. Hm. Bag as [! 1, 2, 3 !] and Set as {! 1, 2, 3 !}, with the outer bracket by analogy with arrays or hashes respectively and the ! having the mnemonic of looking like handles? (I have to imagine there are plenty of Unicode brackets to match.) - -- brandon s. allbery [linux,solaris,freebsd,perl] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkzfCfcACgkQIn7hlCsL25W5DACgzX15js/a8QRcE64QIvAax0kc b1AAn0G+eXfNN9+spB7vvybnAnbn1nFI =EZJL -END PGP SIGNATURE-
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
Brandon S Allbery KF8NH wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/7/10 23:19 , Jon Lang wrote: 1 -- 2 -- 3 Would be a Bag containing three elements: 1, 2, and 3. Personally, I wouldn't put a high priority on this; for my purposes, Bag(1, 2, 3) works just fine. Hm. Bag as [! 1, 2, 3 !] and Set as {! 1, 2, 3 !}, with the outer bracket by analogy with arrays or hashes respectively and the ! having the mnemonic of looking like handles? (I have to imagine there are plenty of Unicode brackets to match.) That saves a singlr character over Bag( ... ) and Set( ... ), respectively (or three characters, if you find decent unicode bracket choices). It still wouldn't be a big enough deal to me to bother with it. As well, my first impression upon seeing [! ... !] was to think you're negating everything inside? That said, I could get behind doubled brackets: [[1, 2, 3]] # same as Bag(1, 2, 3) {{1, 2, 3}} # same as Set(1, 2, 3) AFAIK, this would cause no conflicts with existing code. Or maybe these should be reversed: [[1, 1, 2, 3]] # a Set containing 1, 2, and 3 {{1, 1, 2, 3}} # a Bag containing two 1s, a 2, and a 3 {{1 = 2, 2 = 1, 3 = 1}} # another way of defining the same Bag, with explicit counts. OTOH, perhaps the outermost character should always be a square brace, to indicate that it operates primarily like a list; while the innermost character should be either a square brace or a curly brace, to hint at thye kind of syntax that you might find inside: [[1, 1, 2, 3]] # a Set containing 1, 2, and 3 [{1, 1, 2, 3}] # a Bag containing two 1s, a 2, and a 3 [{1 = 2, 2 = 1, 3 = 1}] # another way of defining the same Bag, with explicit counts. Yeah; I could see that. The only catch is that it might cause problems with existing code that nests square or curly braces inside of square braces: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # fail; would try to create Set from 1, 2, 3], [4, 5, 6], [7, 8, 9 [ [1, 2, 3], [4, 5, 6], [7, 8, 9] ] # creates 3-by-3 array ...so maybe not. It should never be more than two characters on either side; and there's some benefit to using square or curly braces as one of them, to hint at proper syntax within. Hmm... how about: |[1, 2, 3]| # Set literal |[1=true, 2=true, 3=true]| # technically possible; but why do it? |{1, 1, 2, 3}| # Bag literal |{1=2, 2=1, 3=1}| # counted Bag literal -- Jonathan Dataweaver Lang
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
Jonathan Lang (): As well, my first impression upon seeing [! ... !] was to think you're negating everything inside? That said, I could get behind doubled brackets: [[1, 2, 3]] # same as Bag(1, 2, 3) {{1, 2, 3}} # same as Set(1, 2, 3) AFAIK, this would cause no conflicts with existing code. Or maybe these should be reversed: [[1, 1, 2, 3]] # a Set containing 1, 2, and 3 {{1, 1, 2, 3}} # a Bag containing two 1s, a 2, and a 3 {{1 = 2, 2 = 1, 3 = 1}} # another way of defining the same Bag, with explicit counts. OTOH, perhaps the outermost character should always be a square brace, to indicate that it operates primarily like a list; while the innermost character should be either a square brace or a curly brace, to hint at thye kind of syntax that you might find inside: [[1, 1, 2, 3]] # a Set containing 1, 2, and 3 [{1, 1, 2, 3}] # a Bag containing two 1s, a 2, and a 3 [{1 = 2, 2 = 1, 3 = 1}] # another way of defining the same Bag, with explicit counts. Yeah; I could see that. The only catch is that it might cause problems with existing code that nests square or curly braces inside of square braces: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # fail; would try to create Set from 1, 2, 3], [4, 5, 6], [7, 8, 9 [ [1, 2, 3], [4, 5, 6], [7, 8, 9] ] # creates 3-by-3 array ...so maybe not. It should never be more than two characters on either side; and there's some benefit to using square or curly braces as one of them, to hint at proper syntax within. Hmm... how about: |[1, 2, 3]| # Set literal |[1=true, 2=true, 3=true]| # technically possible; but why do it? |{1, 1, 2, 3}| # Bag literal |{1=2, 2=1, 3=1}| # counted Bag literal After skimming all those suggestions, I have yet another proposal: let's not add anything, creating marginal gain with lots of extra syntax. That saves a singlr character over Bag( ... ) and Set( ... ), respectively (or three characters, if you find decent unicode bracket choices). It still wouldn't be a big enough deal to me to bother with it. +1. Let's leave it at that. // Carl
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
Carl Mäsak wrote: Jonathan Lang (): That saves a singlr character over Bag( ... ) and Set( ... ), respectively (or three characters, if you find decent unicode bracket choices). It still wouldn't be a big enough deal to me to bother with it. +1. Let's leave it at that. That said, I do think that Bag( ... ) should be able to take pairs, so that one can easily create a Bag that holds, say, twenty of a given item, without having to spell out the item twenty times. Beyond that, the only other syntax being proposed is a set of braces to be used to create Bags and Sets, as part of the initiative to make them nearly as easy to use as lists. In essence, you'd be introducing two operators: circumfix:|[ ]| and circumfix:|{ }|, as aliases for the respective Set and Bag constructors. As I said, it's not a big deal - either way. Really, my main issue remains the choice of sigil for a variable that's supposed to hold baggy containers. -- Jonathan Dataweaver Lang
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
Jon Lang wrote: That saves a singlr character over Bag( ... ) and Set( ... ), respectively (or three characters, if you find decent unicode bracket choices). It still wouldn't be a big enough deal to me to bother with it. As well, my first impression upon seeing [! ... !] was to think you're negating everything inside? That said, I could get behind doubled brackets: [[1, 2, 3]] # same as Bag(1, 2, 3) {{1, 2, 3}} # same as Set(1, 2, 3) snip I prefer to have the mnemonic that {} means unordered and that [] means ordered, so please stick to [] meaning arrays or ordered collections, an {} meaning unordered collections, so set and bag syntax should be based around {} if either. This said, I specifically think that a simple pair of curly braces is the best way to mark a Set. So: {1,2,3} # a Set of those 3 elements ... and this is also how it is done in maths I believe (and in Muldis D). In fact, I strongly support this assuming that all disambiguation eg with hashes can be specified. {a=1,b=2} # a Hash of 2 pairs {:a1, :a2} # we'll have to pick a meaning {} # we'll have to pick a meaning (Muldis D makes it a Set; %:{} is its Hash) {;} # an anonymous sub or something {a=1} # Hash {1} # Set {1;} # anonymous sub or something But keep that simple an let nesting work normally, so: {{1}} # a Set of 1 element that is a Set of 1 element {{a=1}} # a Set with 1 Hash element {[1]} # a Set with 1 Array element [{1}] # an Array with 1 Set element In certain cases, we can always still fall back to this: Set() # empty Set Hash() # empty Hash Set(:a1) # if that's what we wanted As for bags, well I think that is where we could get fancier. But *no* doubling up, as we don't want to interfere with nesting. Instead, it is common in maths to associate a + with set syntax to refer to bags instead. So, does Perl already ascribe a meaning to putting a + with various bracketing characters, such as this: +{1,2,2,5} # a Bag of 4 elements with 2 duplicates +{} # an empty Bag, unless that already means something So would the above try to cast the collection as a number, or take the count of its elements, or can we use something like that? But I would recommend something along those lines. I suppose then if +{} works for bags we could alternately use -{} for sets but I don't really like it. -- Darren Duncan
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
On Sun, Nov 14, 2010 at 12:12 AM, Jon Lang datawea...@gmail.com wrote: Carl Mäsak wrote: Jonathan Lang (): That saves a singlr character over Bag( ... ) and Set( ... ), respectively (or three characters, if you find decent unicode bracket choices). It still wouldn't be a big enough deal to me to bother with it. +1. Let's leave it at that. That said, I do think that Bag( ... ) should be able to take pairs, so that one can easily create a Bag that holds, say, twenty of a given item, without having to spell out the item twenty times. Doesn't the xx operator cover this? Eirik
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
Darren Duncan wrote: Jon Lang wrote: That saves a singlr character over Bag( ... ) and Set( ... ), respectively (or three characters, if you find decent unicode bracket choices). It still wouldn't be a big enough deal to me to bother with it. As well, my first impression upon seeing [! ... !] was to think you're negating everything inside? That said, I could get behind doubled brackets: [[1, 2, 3]] # same as Bag(1, 2, 3) {{1, 2, 3}} # same as Set(1, 2, 3) snip I prefer to have the mnemonic that {} means unordered and that [] means ordered, so please stick to [] meaning arrays or ordered collections, an {} meaning unordered collections, so set and bag syntax should be based around {} if either. This said, I specifically think that a simple pair of curly braces is the best way to mark a Set. So: {1,2,3} # a Set of those 3 elements ... and this is also how it is done in maths I believe (and in Muldis D). In fact, I strongly support this assuming that all disambiguation eg with hashes can be specified. That would be great. {a=1,b=2} # a Hash of 2 pairs {:a1, :a2} # we'll have to pick a meaning My preference would be for this to be a Set that contains two items in it, both of which are pairs. IIRC, there's already behavior along these lines when it comes to pairs. {} # we'll have to pick a meaning (Muldis D makes it a Set; %:{} is its Hash) Is there any difference between an empty Set and an empty Hash? If so, is one more general than the other? Just as importantly, what does {} do right now? {;} # an anonymous sub or something {a=1} # Hash {1} # Set {1;} # anonymous sub or something Sets built from multi-dimensional arrays migt be a problem: {1, 2, 3: 4, 5, 6} But keep that simple an let nesting work normally, so: {{1}} # a Set of 1 element that is a Set of 1 element {{a=1}} # a Set with 1 Hash element {[1]} # a Set with 1 Array element [{1}] # an Array with 1 Set element In certain cases, we can always still fall back to this: Set() # empty Set Hash() # empty Hash Set(:a1) # if that's what we wanted As for bags, well I think that is where we could get fancier. But *no* doubling up, as we don't want to interfere with nesting. Instead, it is common in maths to associate a + with set syntax to refer to bags instead. So, does Perl already ascribe a meaning to putting a + with various bracketing characters, such as this: +{1,2,2,5} # a Bag of 4 elements with 2 duplicates +{} # an empty Bag, unless that already means something So would the above try to cast the collection as a number, or take the count of its elements, or can we use something like that? I'd expect +{...} to count the elements. -- Jonathan Dataweaver Lang
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
On Tuesday, 9. November 2010 01:45:52 Mason Kramer wrote: I have to disagree here. Arrays and Hashes may be about storage (I don't think they are, though, since you can change the (storage) implemenation of an Array or Hash via its metaclass and it can still remain an Array or Hash). What I mean with storage is that you put some data into a numbered slot in an array and a keyed slot into a hash. With the same index or key you can retrieve your data at any time. This is the case irrespective of the underlying implementation. A set is not about storage in this sense, because there is no way of retrieving an element. The only operation is a membership test which is of boolean nature like number comparison. The most important part of the @ sigil, and the reason I preferred it over $, is that @ flattens (moritz++'s word), when used in a list context such as for @blah, map {...}, @blah. I wonder if it is not possible to bind flattening to Iterable. This of course has the drawback that it is not syntactically distinguished. But doesn't my $x = (1,2,3); my $y = map {$^x * $^x}, $x; result in $y containing the list (1,4,9)? And if $x happens to be a scalar isn't it just squared? In the end we just need map:( closure, Set $data -- Set) as an overload. Or perhaps map:( closure, Iterable ::T $data -- T). Regards, TSa. -- The unavoidable price of reliability is simplicity -- C.A.R. Hoare Simplicity does not precede complexity, but follows it. -- A.J. Perlis 1 + 2 + 3 + 4 + ... = -1/12 -- Srinivasa Ramanujan
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
On 11/09/2010 09:26 PM, TSa (Thomas Sandlaß) wrote: But doesn't my $x = (1,2,3); my $y = map {$^x * $^x}, $x; result in $y containing the list (1,4,9)? Not at all. The $ sigil implies a scalar, so what you get is roughly my $y = (1, 2, 3).item * (1, 2, 3).item; so $y ends up with a single list item of 9. Cheers, Moritz
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
On Sun, Nov 7, 2010 at 11:19 PM, Jon Lang datawea...@gmail.com wrote: Mason Kramer wrote: I'd like to anticipate one objection to this - the existence of the 'hyper' operator/keyword. The hyper operator says, I am taking responsibility for this particular code block and promising that it can execute out of order and concurrently. Creating a Bag instead of an Array says, there is no meaning to the ordering of this group of things, ever. Basically, if I know at declaration time that my collection has no sense of ordering, then I shouldn't have to annotate every iteration of that collection as having no sense of ordering, which is nearly what hyper does (though, I readily admit, not quite, because there are unordered ways to create race conditions). My understanding of the hyperoperator is that its primary use is to say operate on the individual elments of this collection, instead of on the collection itself. In that regard, it's just as applicable to Bags and Sets as it is to lists. Except... Except that the hyperoperator assumes that the collections are ordered. It matches the first element on the left with the first element on the right; the second element on the left with the second on the right; and so on. Bags and Sets don't have a useful notion of first, second, etc. So what should happen if I try to apply a hyperoperator with a Bag or Set on one side? Well, hyperoperators work fine on Hashes, they operate on the values, paired up by key if needed. (That is, %hash++ doesn't care about the keys, %hash1 + %hash2 sums based on keys.) I would assume that Bag should work in the exact same way. Dunno how Set should work in this context, though. -- Solomon Foster: colo...@gmail.com HarmonyWare, Inc: http://www.harmonyware.com
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
Solomon Foster wrote: Well, hyperoperators work fine on Hashes, they operate on the values, paired up by key if needed. (That is, %hash++ doesn't care about the keys, %hash1 + %hash2 sums based on keys.) I would assume that Bag should work in the exact same way. Dunno how Set should work in this context, though. I would hope that Bags would not work the same way. If they do, then you get things like: Bag(1, 3, 2, 1) + Bag(2, 3, 1, 2) # same as Bag(1, 1, 1, 2, 2, 2, 3, 3) I'm not sure how (or even if) Bags _should_ work in this context; but the above is definitely not what I'd expect. IMHO, a key point about Bags and Sets (no pun intended) is that the values of the elements _are_ the keys; the existence of separate values (unsigned integers in the case of Bags; booleans in the case of Sets) are - or should be - mostly a bookkeeping tool that rarely shows itself. Incidently, we might want to set up a role to define the shared behavior or Bags, Sets, et al. My gut instinct would be to call it Baggy; Setty would make the stargazers happy, but otherwise wouldn't mean much. With this, you could do things like creating a FuzzySet that stores a number between zero and one for each key, but which otherwise behaves like a Set. -- Jonathan Dataweaver Lang
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
I'm honored that my letter generated so much activity, and thank you all for your thoughtful responses. I'd like to address a few points. On Monday, 8. November 2010 17:20:43 Jon Lang wrote: Solomon Foster wrote: Well, hyperoperators work fine on Hashes, they operate on the values, paired up by key if needed. (That is, %hash++ doesn't care about the keys, %hash1 + %hash2 sums based on keys.) I would assume that Bag should work in the exact same way. Dunno how Set should work in this context, though. I would hope that Bags would not work the same way. If they do, then you get things like: Bag(1, 3, 2, 1) + Bag(2, 3, 1, 2) # same as Bag(1, 1, 1, 2, 2, 2, 3, 3) With respect to Bags and » and «, the spec has something to say (somewhere in S03): in fact, an upgraded scalar is the only thing that will work for an unordered type such as a Bag: Bag(3,8,2,9,3,8) - 1; # Bag(2,7,1,8,2,7) === Bag(1,2,2,7,7,8) This makes sense to me. I don't see how it could be otherwise. This code snippet also makes it clear that » and « operate on the keys of a Bag, and not the counts or pairs of a Bag. This also makes sense to me, since Bags ought to act much more like their keys than either their values or an EnumMap of k,v. Please note that my original post did not address » and «, but rather the hyper keyword / adverb, as in hyper for { ... }. On Nov 8, 2010, at 04:25 PM, TSa (Thomas Sandlaß) wrote: snip I'm generally very happy with the choice of sigil for Sets and Bags because this is what they are: scalars as far as storage is concerned. More important is to have the right set of operators that automatically imply Bags: (1,2,3,4) () (2,3) === Bag(1,2,2,3,3,4). Arrays and Hashes are about storage. In the abstract the memory of a computer is one big array! Sets and Bags are about operations on them like the numeric operations are on numbers or the string operators on strings. So it is very important to keep the domains nicely separated by means of disjoint operators. This is why we have ~ for concatenation and not overloaded +. It makes of course sense to iterate a Bag. But indexing it doesn't. We are also not indexing into strings: blah[2] is not 'a'. snip I have to disagree here. Arrays and Hashes may be about storage (I don't think they are, though, since you can change the (storage) implemenation of an Array or Hash via its metaclass and it can still remain an Array or Hash). But sigils are definitely not about the storage of the underlying data. Your own statement gives the contradiction - In actual storage in the memory of a computer, everything is somewhere in a big array. But yet, we don't prefix everything with an @ sigil. So clearly, sigils are about something else. jnthn said today, in irc, that sigils are about an interface contract. Everyone seems to agree that they imply the Positional role (i.e., the postcircumfix:[] method), and that Rakudo heavily relies on this conflation, so I'm withdrawing the suggestion that @ means does Iterable instead of does Positional. The most important part of the @ sigil, and the reason I preferred it over $, is that @ flattens (moritz++'s word), when used in a list context such as for @blah, map {...}, @blah. Having Bags flatten in list context is pretty crucial to their being as easy and terse to use as arrays, because flattening is fundamental to how Arrays are used, and Bags will be used like Arrays. Luckily, however, %, which implies the Associative contract, also flattens in list context. If Bags and Sets are sigiled with %, they should flatten, and all that remains is to make sure they flatten into a list of keys, and not a list of enums. Any thoughts on that?
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
This is going to be a rambling answer, as I have a number of questions but no firm conclusions. Please bear with me. Mason Kramer wrote: Having Bags flatten in list context is pretty crucial to their being as easy and terse to use as arrays, because flattening is fundamental to how Arrays are used, and Bags will be used like Arrays. Luckily, however, %, which implies the Associative contract, also flattens in list context. If Bags and Sets are sigiled with %, they should flatten, and all that remains is to make sure they flatten into a list of keys, and not a list of enums. The only qualm that I have with using % as a prefix for baggy things is that % carries the connotation that you're dealing with key/value pairs. While baggy things _can_ be thought of as pairs, they're value/membership pairs rather than key/value pairs; and the membership side of the pair should be hidden from view unless explicitly requested. In short, a casual programmer ought to be encouraged to think of a baggy thing as being a collection of values; the % sigil implicitly encourages him to think of it as a collection of pairs. That said, the problem with % is that baggies implement its features in an unusual way; the problem with @ is that baggies don't implement all of its features. Conceptually, @ (a collection of values) is a better fit than % (a collection of pairs); but technically, the reverse is true: % (does Associative) is a better fit than @ (does Positional). Query: should %x.k, %x.v, %x.kv, and %x.pair produce lists, bags, or sets? As I understand it, right now all four produce lists. I could see a case for having %x.k and %x.pair produce sets, while %x.kv definitely should produce a list (since even though the overall order doesn't matter, which even element follows which odd element _does_ matter); and %x.v might reasonably produce a bag. OTOH, if this is done then there will be no way to talk about, e.g., %x.k[0]. I'm wondering if bags and sets _should_ do Positional, but with the caveat that the order is arbitrary. After all, that's what currently happens with %x.k: you get a list of the keys, but with the understanding that the order in which you get them is ultimately meaningless. Or is it that the difference between Iterable and Positional is that Positional provides random access to its membership, whereas Iterable only guarantees that you can walk through the members? Another way to phrase my concern is this: one reason why Perl 6 has gone with nominal typing rather than structural typing is that does x can and does promise more than just implements the same features that x implements; it also promises something about the way in which said features will be implemented. In this regard, I would argue that baggies should not do Associative; because even though they implement all of the same features that Associative promises to implement, they don't do so in a way that's compatible with Associative's underlying philosophy of keys and values. And if they don't do Associative, it doesn't make sense for them to use the % sigil. I hesitate to suggest this; but might we want a special sigil for Iterable, to be used when neither Positional nor Associative is quite right? Such a sigil might be useful for more than just baggies; for instance, a stream is Iterable but neither Positional nor Associative. -- Jonathan Dataweaver Lang
Bag / Set ideas - making them substitutable for Arrays makes them more useful
I just implemented Bag to the point where it passes the spectests. (https://github.com/masonk/rakudo/commit/2668178c6ba90863538ea74cfdd287684a20c520) However, in doing so, I discovered that I'm not really sure what Bags are for, anymore. The more I think about Bags and Sets, the more my brain hurts. They're a half an EnumMap and half an Iterable that does Associative but not Positional. However, I'm starting to believe that they are more like Iterables than EnumMaps. When I imagine using them, I think of Sets as a cute way to operate on the unique elements of an Iterable. I think of Bags / KeyBags as a way to remove ordering, which is a generally useful thing (everything that I'm about to say applies to both Bags and KeyBags, but I'm going to only talk about Bags for the rest of this post). This is because, most of the time, we don't care about ordering, and having ordering on all of our collections even when we don't need it increases program complexity in time in a way that could be seen as analogous to the way in which unnecessarily global variables increased the space complexity of Perl 5. I want to propose one major change to the Bag spec: When a Bag is used as an Iterable, you get an Iterator that has each key in proportion to the number of times it appears in the Bag. With this one change to Bags, I could use them whenever I don't need ordering in my lists - which is usually. Even though there are some side effects that don't rely on ordering (e.g., incrementation), the majority of them do - so by using this new kind of Bag, I would be reducing the complexity of my programs. Now, since Sets already give us the distinct values, having Bags do the same thing seems like redundant functionality, where we could be getting novel functionality. I'd like to anticipate one objection to this - the existence of the 'hyper' operator/keyword. The hyper operator says, I am taking responsibility for this particular code block and promising that it can execute out of order and concurrently. Creating a Bag instead of an Array says, there is no meaning to the ordering of this group of things, ever. Basically, if I know at declaration time that my collection has no sense of ordering, then I shouldn't have to annotate every iteration of that collection as having no sense of ordering, which is nearly what hyper does (though, I readily admit, not quite, because there are unordered ways to create race conditions). I also have some convenience syntax suggestions. I do think this is important because Bags and Sets are competing with Arrays. If they aren't as convenient as Arrays to use, they won't get used - even though they're closer, semantically, to what the developer wants in a lot of cases. First, we should besigil Bags and Sets with @ instead of $. Without this convenience, I'm not likely to replace my Arrays with Bags, because going through them in a loop or map would be a pain compared to Arrays. If I have to say $bag.keys every single time, forgettaboutit. This, however, probably requires a change to S03, which says that the @ sigil is a means of coercing the object to the Positional (or Iterable?) role. It seems to me, based on the guiding principle that perl6 should support functional idioms and side-effect free computing, the more fundamental and important aspect of things with @ in front is that you can go through them one by one, and not that they're ordered (since ordering is irrelevant in functional computing, but iterating is not). My feeling is that we should reserve the special syntax for the more fundamental of the two operations, so as not to bias the programmer towards rigid sequentiality through syntax. Second, I would be even more likely to replace my ordered lists with Bags if there were a convenient operator for constructing Bags. I can't think of any good non-letter symbols that aren't taken right now (suggestions welcome), but, at least, b and s as aliases to bag and set would be convenient. Bags and Sets thus updated would look like this in use: C my @array = a a b c ; my @set = s...@array; for s...@array { say $_ }; for @set { say $_ };# same thing # b«»a«»c«» # ordering undefined # most common use case for sets, I think, is unique elements of @array, isn't it? hyper for @bag { ... }; # a«»b«»c«» a«» # ordering undefined = less-thinking-required hyper b a b c c === b c c b a # Wouldn't this be the best way to make a comparison with these semantics? # By the way, this useful idiom works as currently specced, but doesn't work in my implementation @bag{a} # 2 @bag{a b z} # 2, 1, 0 [+] bag @array{a b z} # 3 # this is also neat for How many a's, b's, and z's do I have? +...@bag # 4 @bag[2] # I can't think of a meaning for this - not Positional - S03 needs a change? @bag.WHAT # Bag() @bag.pairs # a = 2, b = 1, c = 1 # ordering undefined @bag.values # 2, 1, 1 # ordering undefined Junctions:
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
Mason Kramer wrote: snip I want to propose one major change to the Bag spec: When a Bag is used as an Iterable, you get an Iterator that has each key in proportion to the number of times it appears in the Bag. snip You present some interesting thoughts here. But I don't have enough time to think about any implications to the point of agreeing or disagreeing with that change, other than to say that the proposal seems reasonable at first glance. However, if the above proposal is done, I would still want an easy way to get the value-count pairs from a bag if I wanted them. I do agree though with the principle that sets and bags should be just as easy and terse to use as arrays. -- Darren Duncan
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
Mason Kramer wrote: I'd like to anticipate one objection to this - the existence of the 'hyper' operator/keyword. The hyper operator says, I am taking responsibility for this particular code block and promising that it can execute out of order and concurrently. Creating a Bag instead of an Array says, there is no meaning to the ordering of this group of things, ever. Basically, if I know at declaration time that my collection has no sense of ordering, then I shouldn't have to annotate every iteration of that collection as having no sense of ordering, which is nearly what hyper does (though, I readily admit, not quite, because there are unordered ways to create race conditions). My understanding of the hyperoperator is that its primary use is to say operate on the individual elments of this collection, instead of on the collection itself. In that regard, it's just as applicable to Bags and Sets as it is to lists. Except... Except that the hyperoperator assumes that the collections are ordered. It matches the first element on the left with the first element on the right; the second element on the left with the second on the right; and so on. Bags and Sets don't have a useful notion of first, second, etc. So what should happen if I try to apply a hyperoperator with a Bag or Set on one side? The cross operators should also be looked at in this regard, though I anticipate fewer problems there. This, however, probably requires a change to S03, which says that the @ sigil is a means of coercing the object to the Positional (or Iterable?) role. It seems to me, based on the guiding principle that perl6 should support functional idioms and side-effect free computing, the more fundamental and important aspect of things with @ in front is that you can go through them one by one, and not that they're ordered (since ordering is irrelevant in functional computing, but iterating is not). My feeling is that we should reserve the special syntax for the more fundamental of the two operations, so as not to bias the programmer towards rigid sequentiality through syntax. I tend to agree here - though to be clear, my @x should still normally result in a list, sans further embellishments (e.g., my Bag @x). Second, I would be even more likely to replace my ordered lists with Bags if there were a convenient operator for constructing Bags. I can't think of any good non-letter symbols that aren't taken right now (suggestions welcome), but, at least, b and s as aliases to bag and set would be convenient. Such a character ought to be some sort of punctuation, preferably of a type that's similar to the comma and semicolon. For a Bag, you might consider an emdash (—), with the ascii equivalent being infix:--. So: 1 -- 2 -- 3 Would be a Bag containing three elements: 1, 2, and 3. Personally, I wouldn't put a high priority on this; for my purposes, Bag(1, 2, 3) works just fine. -- Jonathan Dataweaver Lang
Re: Bag / Set ideas - making them substitutable for Arrays makes them more useful
On 11/08/2010 01:51 AM, Darren Duncan wrote: Mason Kramer wrote: snip I want to propose one major change to the Bag spec: When a Bag is used as an Iterable, you get an Iterator that has each key in proportion to the number of times it appears in the Bag. snip You present some interesting thoughts here. But I don't have enough time to think about any implications to the point of agreeing or disagreeing with that change, other than to say that the proposal seems reasonable at first glance. However, if the above proposal is done, I would still want an easy way to get the value-count pairs from a bag if I wanted them. There'd be still .kv and .pairs. Cheers, Moritz