On 22 February 2010 20:28, Sean Devlin <francoisdev...@gmail.com> wrote:
> Then is the seq (1 :a a) guaranteed?  How do I know that I won't get
> (2 :b b), (1 :b c), etc?  What if I want a specific combination
> instead?  I've had to actually code this specific problem, and I found
> that using group-by & some secondary mapping operation was the only
> thing that gave me the flexibility I needed (manufacturing is fun!).

The ordering guarantees distinct-by makes are exactly those that
distinct makes, because it uses the same code (as mentioned
previously, I lifted it all from clojure.core, then tweaked to take
the keyfn / eqfn into account). Basically this means that if your
collection has an intrinsic ordering, it will be preserved (the result
will include, for each equivalence class of items from the sequence
modulo the user-defined equivalence relation, the one earliest w.r.t.
that ordering). If it's a hash-map or a hash-set instead, you'll get
whatever ordering (seq coll) happens to produce.

As for group-by giving you more flexibility -- well, it gives you a
lot of flexibility where it's appropriate to use it, but because of
its choice of data structure for the result, you can't use it to
reimplement distinct-by directly:

user=> (group-by class [1 2 3 :a :b :c 'a 'b 'c])
java.lang.ClassCastException: java.lang.Class cannot be cast to
java.lang.Comparable (NO_SOURCE_FILE:0)

So no way to use non-Comparables as keys...

And then there's the fact that you can't tell in which order the keys
discovered by group-by appeared in the original collection, which is
again because of its use of sorted-map, which has the consequence that
order is being mangled on purpose! E.g.:

user=> (seq (group-by #(- %) [1 2 3 4 5]))
([-5 [5]] [-4 [4]] [-3 [3]] [-2 [2]] [-1 [1]])

In other words: (seq (group-by f coll)) has an ordering possibly
completely unrelated to that of coll (so you'd have to make a separate
traversal through the coll to discover the original ordering of the
keys), whereas (distinct-by f coll), for either version of
distinct-by, preserves the ordering of coll. That's a desirable
property for when that's what you want to do, whereas group-by will, I
suppose, be more useful on other occasions. ;-)

To sum it up, (1) distinct-by actually behaves in a very predictable
way (which may or may not be useful for any particular purpose), (2)
it cannot be implemented directly in terms of group-by. I'd say it's
pretty orthogonal to the existing library functions (that I know of)
actually...

Sincerely,
Michał

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to