Re: Question about sets

Chas Emerick Sun, 05 Aug 2012 07:33:12 -0700

On Aug 5, 2012, at 2:56 AM, Sean Corfield wrote:

> On Sat, Aug 4, 2012 at 11:45 PM, Mark Engelberg
> <mark.engelb...@gmail.com> wrote:
>> In any case, Clojure is already able to detect when x and y are equal in
>> something like #{x y} and report it as an error.
> 
> So do you think #{1 1} should not be an error? And {:a 1 :a 2}? These
> seem like "obvious" programmer errors that I'd want the compiler to
> catch.


I hit exactly the problem the OP raised this past week.  I knew of the (good) 
restriction on e.g. #{1 1} being an error, and remember some of the 
conversations leading up to the changes that made it so, but had either 
forgotten or never quite internalized that #{a b} is also an error, where a and 
b are equivalent values.

First, some history:
Looks like 
http://groups.google.com/group/clojure/browse_thread/thread/5a38a6b61b09e025 
was the original thread where duplicate map keys first came up as an issue, 
although the differences between array-maps and hash-maps were the original 
impetus.
This led to http://dev.clojure.org/jira/browse/CLJ-87 being filed
Rich's first change was to make duplicate map keys an error, regardless of 
whether the values in question were literals themselves or evaluated from an 
expression: 
https://github.com/clojure/clojure/commit/e6e39d5931fbdf3dfa68cd2d059b8e26ce45c965
A brief discussion in irc ensued — 
http://clojure-log.n01se.net/date/2010-04-05.html#10:56a — where Rich suggested 
that sets should probably be subject to the same rules as keys of maps (which, 
of course they should be, whatever those rules may be, since a map's keys are 
always a set).
The final commit on the issue extended the error-checking of map keys to set 
values: 
https://github.com/clojure/clojure/commit/c733148ba0fb3ff7bbab133f5375422972e62d08
Note that the .createWithCheck variations of all of the collections in question 
are used by their "constructor" functions as well, e.g. hash-set, hash-map, and 
array-map:

=> (hash-set 1 2 2)
IllegalArgumentException Duplicate key: 2  
clojure.lang.PersistentHashSet.createWithCheck (PersistentHashSet.java:80)
=> (hash-map 1 2 1 3)
IllegalArgumentException Duplicate key: 1  
clojure.lang.PersistentHashMap.createWithCheck (PersistentHashMap.java:92)
=> (array-map 1 2 1 3)
IllegalArgumentException Duplicate key: 1  
clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:70)

The only way to get around the checks here is to use `set` or `into`; note that 
there is no "constructor" function for an unsorted map that does not check that 
the provided keys are unique.

Interestingly, sorted maps and sets do *not* have the same restriction:

=> (sorted-map 1 2 1 3)
{1 3}
=> (sorted-set 1 2 1)
#{1 2}

Quoting Rich from the mailing list thread linked above:

> These are bugs in user code. Map literals are in fact read as maps, so 
> a literal map with duplicate keys isn't going to produce an evaluated 
> map with distinct keys. If you create an array map with duplicate 
> keys, bad things will happen.

In the end, even though I've been recently "bitten" by the checked creation of 
sets from a literal, I think it's a reasonable approach.  In #{a b}, you are 
specifying the creation of a set containing two and exactly two values, those 
named by a and b.  There's an explicit invariant being specified in that code.  
Thinking over my various uses of #{} syntax, I remember times where I've 
expected it to enforce that invariant (and throw an error if dupes were 
provided) and times where I've expected it to implicitly apply `distinct` to 
the values provided; that indicates sloppiness on my part, not a place where 
the language should become psychic.

In contrast, `set` and `into` each accept a seqable collection of data, and are 
explicit about their support for sifting out duplicates (right in the docstring 
of `set`, and transitively so for `into` due to its use of `conj` and its 
semantics on sets).

Finally, for the sake of consistency, it seems like the same checks should be 
applied by the sorted map and set "constructor" functions, and that there 
should be a map corollary to `set` (i.e. a function that is the equivalent of 
#(into {} %), just as `set` is the equivalent of #(into #{} %)).  This last one 
is problematic in terms of naming, though.

Cheers,

- Chas

--
http://cemerick.com
[Clojure Programming from O'Reilly](http://www.clojurebook.com)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Question about sets

Reply via email to