On Wed, Oct 21, 2009 at 9:58 PM, Dmitry Kakurin <dmitry.kaku...@gmail.com>wrote:

> On Oct 21, 6:45 pm, John Harrop <jharrop...@gmail.com> wrote:
> > the reduction is wrapping the initial seq of empty vectors in ten
> > thousand layers of map ... fn ... invoke ... map ... etc.
> > Reducing a lazy sequence generator like map over a large sequence does
> not work well in Clojure.
>
> I wonder if this could be improved using some internal queuing or
> trampoline or smth.
>

Me, too.

> Is there a reason not to use
> >
> > (defn multi-filter [filters coll]
> >   (map filter filters (repeat coll)))
> >
> > instead?
>
> Yes: in the real app the coll is coming from a file so I only want to
> iterate it once plus I don't really want to keep it all in memory (I
> never keep the head).


Your version keeps everything that passes any of the filters in memory,
which may not be much better depending on whether the filters tend to
include nearly everything, exclude nearly everything, or somewhere in
between.

You probably therefore want this instead:

(defn multi-filter [filters coll]
  (let [c (count filters)
        ignore (Object.)]
    (map
      (fn [i]
        (remove #(= % ignore)
          (take-nth c
            (drop i
              (mapcat #(map (fn [p e] (if (p e) e ignore)) filters (repeat c
%)) coll)))))
      (range c))))

user=> (multi-filter [even? odd? even?] (range 10))
((0 2 4 6 8) (1 3 5 7 9) (0 2 4 6 8))
user=> (let [[a b c] (multi-filter [even? odd? even?] (range 10000))]
[(count a) (count b) (count c)])
[5000 5000 5000]

Upside: this walks coll only once and doesn't stack overflow.

Downside: after you've consumed the first output sequence, the whole
sequence is held in memory (it holds onto the head). If you consume all of
the output sequences in tandem, then the head can get discarded over time
however; in this particular case once you consume the first item from all
three sequences, the first few elements of the mapcat can be dropped and the
first few Integers from coll can be dropped.

Consuming the output sequences one at a time will require sufficient heap
space.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to