Re: generalize distinct

Eugen Dück Tue, 23 Feb 2010 04:18:02 -0800

I agree with you, Michal.

But let me rephrase the question, maybe my initial long-winded post
wasn't clear enough on that.


Rather than having a separate fn 'distinct-by' in addition to the
existing 'distinct', which, apart from the hard-coded keyfn would be
EXACTLY the same, shouldn't we just generalize distinct to default to
hard-coded if no keyfn is specified? (Btw I'm not considering my other
suggestion to allow the set to be passed in here, which I personally
like, but also understand can be seen as unorthogonal)

And this is not only true for distinct-by, it's true for some of the
other * / *-by pairs. It just seems this should be generalized in
order not to duplicate code for these cases. If we copy-and-paste
code, what's the justification? I'd say orthogonality is an argument
against copy-and-paste. Do we copy and paste rather than generalize in
order to have distinct a little bit faster due to it being hard-coded?

Eugen

On Feb 23, 5:13 am, Michał Marczyk <michal.marc...@gmail.com> wrote:
> On 22 February 2010 20:28, Sean Devlin <francoisdev...@gmail.com> wrote:
>
> > Then is the seq (1 :a a) guaranteed?  How do I know that I won't get
> > (2 :b b), (1 :b c), etc?  What if I want a specific combination
> > instead?  I've had to actually code this specific problem, and I found
> > that using group-by & some secondary mapping operation was the only
> > thing that gave me the flexibility I needed (manufacturing is fun!).
>
> The ordering guarantees distinct-by makes are exactly those that
> distinct makes, because it uses the same code (as mentioned
> previously, I lifted it all from clojure.core, then tweaked to take
> the keyfn / eqfn into account). Basically this means that if your
> collection has an intrinsic ordering, it will be preserved (the result
> will include, for each equivalence class of items from the sequence
> modulo the user-defined equivalence relation, the one earliest w.r.t.
> that ordering). If it's a hash-map or a hash-set instead, you'll get
> whatever ordering (seq coll) happens to produce.
>
> As for group-by giving you more flexibility -- well, it gives you a
> lot of flexibility where it's appropriate to use it, but because of
> its choice of data structure for the result, you can't use it to
> reimplement distinct-by directly:
>
> user=> (group-by class [1 2 3 :a :b :c 'a 'b 'c])
> java.lang.ClassCastException: java.lang.Class cannot be cast to
> java.lang.Comparable (NO_SOURCE_FILE:0)
>
> So no way to use non-Comparables as keys...
>
> And then there's the fact that you can't tell in which order the keys
> discovered by group-by appeared in the original collection, which is
> again because of its use of sorted-map, which has the consequence that
> order is being mangled on purpose! E.g.:
>
> user=> (seq (group-by #(- %) [1 2 3 4 5]))
> ([-5 [5]] [-4 [4]] [-3 [3]] [-2 [2]] [-1 [1]])
>
> In other words: (seq (group-by f coll)) has an ordering possibly
> completely unrelated to that of coll (so you'd have to make a separate
> traversal through the coll to discover the original ordering of the
> keys), whereas (distinct-by f coll), for either version of
> distinct-by, preserves the ordering of coll. That's a desirable
> property for when that's what you want to do, whereas group-by will, I
> suppose, be more useful on other occasions. ;-)
>
> To sum it up, (1) distinct-by actually behaves in a very predictable
> way (which may or may not be useful for any particular purpose), (2)
> it cannot be implemented directly in terms of group-by. I'd say it's
> pretty orthogonal to the existing library functions (that I know of)
> actually...
>
> Sincerely,
> Michał

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: generalize distinct

Reply via email to