Various interesting approaches to this problem... thanks guys!
Stu, you're right, distinct builds up a set of all items encountered,
which I don't need, and it's costly. But that function it is indeed a
good model. So here's a side effects free version modeled after
distinct:
(defn better-unique
[coll]
(let [step (fn step [xs last]
(lazy-seq
((fn [[f :as xs] last]
(when-let [s (seq xs)]
(if (= last f)
(recur (rest s) f)
(cons f (step (rest s) f)))))
xs last)))]
(step coll (Object.))))
Doing some micro-benchmarking, it turns out my initial version is the
fastest, but better-unique is comparable.
This one
(defn parti-by-unique [sc] (map first (partition-by identity sc)))
is much slower, and distinct is the slowest of the ones I tried, as it
does much more than I need. Here are the numbers (and I don't wanna
hear anybody laugh about the absolute values I achieve on my box ;)
user=> (time (count (unique (range 1000000))))
"Elapsed time: 1070.535608 msecs"
1000000
user=> (time (count (better-unique (range 1000000))))
"Elapsed time: 1510.92021 msecs"
1000000
user=> (time (count (parti-by-unique (range 1000000))))
"Elapsed time: 3344.861758 msecs"
1000000
user=> (time (count (distinct (range 1000000))))
"Elapsed time: 6724.705348 msecs"
1000000
And yes, for a generic function like this, that I want to use
everywhere and anywhere, I do consider performance.
better-unique is not too far off from my initial approach, but still,
why no side effects?
Even though I do not know when the side-effects happen (lazily,
eagerly), shouldn't the order be fixed in case of filter?
At least I'm assuming that every reset! of the last atom will happen
at a time so that the result of unique will not be affected.
On Feb 18, 11:34 pm, Stuart Halloway <[email protected]>
wrote:
> Rowdy's question asks for less than what core/distinct delivers--he
> only wanted to remove *adjacent* duplicates.
>
> That said, core/distinct is a better demo of "how do I maintain state
> across a lazy sequence without requiring any mutation?"
>
> Stu
>
> > Hi,
>
> > On Feb 18, 3:04 pm, Rowdy Rednose <[email protected]> wrote:
>
> >> "Returns a lazy sequence of the items in coll for which (pred item)
> >> returns true. pred must be free of side-effects."
>
> >> So that means I should not write a function like this:
>
> >> (defn unique [sc]
> >> "Returns a lazy sequence with all consecutive duplicates removed"
> >> (let [last (atom (Object.))]
> >> (filter #(let [ok (not= @last %)] (reset! last %) ok) sc)))
>
> >> user=> (unique [1 2 3 3 4 5 5 6])
> >> (1 2 3 4 5 6)
>
> >> But in contrast to functions that can be retried (compare-and-swap
> >> etc.), I don't immediately see why having side effects in filter
> >> would
> >> be bad. Can anybody enlighten me? And how should I do this instead?
>
> > Besides the other suggestions: clojure.core/distinct and its
> > implementation.
>
> > Sincerely
> > Meikel
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Clojure" group.
> > To post to this group, send email to [email protected]
> > Note that posts from new members are moderated - please be patient
> > with your first post.
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> >http://groups.google.com/group/clojure?hl=en
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en