Wow! All indentation is gone :-( I'll repost it. Christophe Grand a écrit : > On my spare time, I've been thinking (and trying) to port Left-Fold > Enumerators (http://okmij.org/ftp/Streams.html) to Clojure for a week or > so — incidentally > (http://clojure-log.n01se.net/date/2008-11-06.html#20:41) I learnt that > Rich is also thinking about them. > > Here is a report of my trials and errors on the way to implementing LFE > in Clojure. You can go straight to step 5 for actual working code and a > repl session. > (Abstract: I learnt clever tricks using continuations or the Y > combinator and turned them in a lousy map.) > > > > STEP 0: > The main idea behind LFE (compared to lazy IO or imperative IO) is to > introduce an inversion of control: removing resources (file handles > etc.) allocation and iteration code from user code. > eg: > > (defn count-words [filename] > > (with-open rdr (-> filename java.io.FileReader. java.io.BufferedReader.) > > (reduce (fn [c line] (+ c (-> line (.split " ") count))) 0 > > (line-seq rdr)))) > > can be rewritten using an enumerator: > > (defn count-words [filename] > > ((enum-lines filename) (fn [c line] (+ c (-> line (.split " ") > count))) 0)) > > or, introducing an ereduce function: > > (defn ereduce [f seed e] > > (e f seed)) > > > > (defn count-words [filename] > > (let [io-enum (enum-lines filename)] > > (ereduce (fn [c line] (+ c (-> line (.split " ") count))) 0 io-enum))) > > where enum-lines returns a function which "reduces" the lines of the > specified file using the function and the seed passed as arguments. > Making enumerators implement clojure.lang.IReduce would allow us to > directly use Clojure's reduce with enumerators. > > One can build an enumerator from collections: > > (defn enum-coll [coll] > > (fn [f seed] > > (reduce f seed coll))) > > So far so good. One can even define basic combinators and predicates on > enumerators: econcat, etake, edrop, efilter, emap (only on one enum). > For example: > > (defn econcat [e1 e2] > > (fn [f seed] > > (e2 f (e1 f seed)))) > > > > (defn edrop [n e] > > (fn [f seed] > > (first (e (fn [[acc i] x] (if (pos? i) [acc (dec i)] [(f acc x) 0])) > [seed n])))) > > > > (defn efilter [pred e] > > (fn [f seed] > > (e #(if (pred %2) (f %1 %2) %1) seed))) > > > > (defn etake [n e] > > (fn [f seed] > > ((e > > (fn [[acc n :as a] x] (if (pos? n) [(f acc x) (dec n)] a) > > [seed n]) 0))) > > > > (def enil ;the empty enumerator > > (fn [f seed] > > seed)) > > > > (def enil? [e] > > (e (constantly false) true)) > > One can also build enumerators based on java.nio which would be reduced > on an IO thread. (Since (reduce f seed coll) is roughly equivalent to > (let [a (agent seed)] (doseq [x coll] (send a f x)) (await a) @a) > integration between multiplexed enumerators and agent is relatively > straightforward.) > > There are two drawbacks to this simple design: > * the enumerated collection must be entirely processed (no early > termination, see enil? or etake), > * no "parallel loops" (btw, there's not that much facility for "parallel > traversing" in Clojure: there's map... and map; it could be a job for > the vector-binded doseq (and then we would need a dofor...)). > > Adding early exit means designing a protocol for the reducing function > to signal to the enumerator to stop there. It's not a tough point, I > won't bother about it until the end of this post. > > > > STEP 1: > Concerning "parallel loops" (that is an emap that works on several > enumerators at once, like map does with seqs) I tried to come up with a > solution using one of Oleg Kiselyov's tricks to turn an enumerator > inside out. > The trick is to use a non recursive version of the enumerator — this non > recursive enumerator takes the (recursive) enumerator as its first > argument. (Btw, thanks to the recur special form, some clever macrology > can help building a non recursive enumerator from the source of a > recursive one.) > > With a non-recursive enumerator, ereduce becomes: > > (defn ereduce [f seed non-rec-enum] > > (let [good-old-enum (fn self [f seed] (non-rec-enum self f seed))] > > (good-old-enum f seed))) > > But it smells like blown up stack. Let's trampoline! > > > (defn ereduce [f seed non-rec-enum] ; with trampolining > > (let [ccall (fn self [& args] (cons self args)) ; ccall = capture call > > ccall? #(and (seq? %) (= (first %) ccall))] > > (loop [r (non-rec-enum ccall f seed)] > > (if (ccall? r) > > (recur (apply non-rec-enum r)) > > r)))) > > Back to emap: > > (defn emap [f & nres] ;nres = non recursive enumerators > > (fn [g seed] > > (let [ccall (fn self [& args] (cons self args)) > > ccall? #(and (seq? %) (= (first %) ccall))] > > (loop [rs (map #(%1 ccall (fn [_ x] x) nil) nres) acc seed] > > (if (every ccall? rs) > > (recur > > (map #(%1 ccall (fn [_ x] x) nil) nres) ; déjà vu! Does not sound > good, see below > > (g acc (apply f (map last rs)))) > > acc))))) ; here I should consume all the enums til the end to ensure > correct resource freeing... or trigger early exit or something > > > > STEP 2: > Well that's not that easy, there's an elephant in the room since I > introduced the non recursive enumerator: implicit state! (and yes, I > know, emap does not return a non recursive enumerator but an enumerator) > So the non recursive enumerator must take and pass another arg (its > state) and I need a way to initialize the state. I chose to use > different arities to tell the first call from the others. > > (defn a-non-rec-enum > > ([self f seed] ;first call > > ......) > > ([self state f acc] ;subsequent calls > > ......)) > > The trampolining ereduce does not need to be modified to pass the > enumerator's state around. > Here is the new emap: > > (defn emap [f & nres] ; now with explicit state handling -- and > returns a non rec enum > > (let [ccall (fn self [& args] (cons self args)) > > ccall? #(and (seq? %) (= (first %) ccall)) > > common (fn [rs self g acc] ; > > (if (every ccall? rs) > > (self > > (map second rs) ; compound state > > g > > (g acc (apply f (map last rs)))) > > acc))] ; still TODO: add proper termination > > (fn > > ([self g seed] > > (let [rs (map #(%1 ccall (fn [_ x] x) nil) nres)] > > (common rs self g seed))) > > ([self state g acc] > > (let [rs (map #(%1 ccall %2 (fn [_ x] x) nil) nres state)] ; no more > elephant > > (common rs self g seed)))))) > > > > > STEP 3: where I'm doing it wrong (please skip to STEP 5) > > I'm not happy with ignoring an arg (in (fn [_ x] x)) and passing this g > function around. I think it's time to redefine what constitutes the job > of a non recursive enumerator: enumerating, not reducing over enumerated > values. > > (defn a-non-rec-enum-that-does-no-reducing ; returns [state x] where > x is the last enumerated item or returns eoe when at end > > ([eoe] ;first call > > ......) > > ([state eoe] ;subsequent calls > > ......)) > > Reducing is ereduce's job: > > (defn ereduce [f seed non-rec-enum] > > (let [eoe (Object.)] > > (loop [r (non-rec-enum eoe) > > acc seed] > > (if (identical? eoe r) > > acc > > (recur > > (non-rec-enum (first r) eoe) > > (f acc (second r))))))) > > and then emap is simplified to: > > (defn emap [f & nres] > > (let [common (fn [eoe rs] > > (if (some #(identical? eoe %) rs) > > eoe ; still TODO: add proper termination > > [(map first rs) ; compound state > > (apply f (map second rs))]))] > > (fn > > ([eoe] > > (let [rs (map #(%1 eoe) nres)] > > (common eoe rs))) > > ([state eoe] > > (let [rs (map #(%1 %2 eoe) nres state)] > > (common eoe rs)))))) > > > > STEP 4: Still in error > > I think I'm drifting too far away from original LFEs and still there are > two unattended problems: > * early termination, > * proper termination of enumerators in emap. > > Hopefully, they are related: they both call for a way to clean the state > and resources of an enumerator. This time it won't fit using an arity > overloading. Let's turn the NRE into a map: > > {:enumerators/init (fn [eoe] ; returns [state x] or [state eoe] > > .......) > > :enumerators/step (fn [state eoe] ; returns [state x] or [state eoe] > > .......) > > :enumerators/clean (fn [state] ; the new one > > .......)} > > (As I'm writing, I realize that I could get rid of all these eoes by > returning only [state]. > Right now I'll continue to return [state eoe] since it feels more regular.) > > ereduce and emap, one again: > > (defn ereduce [f seed non-rec-enum] > > (let [eoe (Object.)] > > (loop [[state x] ((non-rec-enum :enumerators/init) eoe) > > acc seed] > > (if (identical? eoe x) > > (do ; argh :-) > > ((non-rec-enum :enumerators/clean) state) > > acc) > > (recur > > ((non-rec-enum :enumerators/step) state eoe) > > (f acc x)))))) > > > > (defn emap [f & nres] > > (let [inits (map :enumerators/init nres) > > steps (map :enumerators/step nres) > > cleans (map :enumerators/clean nres) > > common (fn [eoe rs] > > (let [state (map first rs) > > vals (map second rs)] > > [state > > (if (some #(identical? eoe %) vals) > > eoe > > (apply f vals))])) > > {:enumerators/init (fn [eoe] > > (let [rs (map #(%1 eoe) inits)] > > (common eoe rs))) > > :enumerators/step (fn [state eoe] > > (let [rs (map #((%1 : %2 eoe) steps state)] > > (common eoe rs))) > > :enumerators/clean (fn [state] > > (dorun (map #(%1 %2) cleans state))})) > > > It's interesting to notice that, at this point, etake does not need > early termination > > (defn etake [n e] > > {:enumerators/init (fn [eoe] > > (if (pos? n) > > (let [[state x] ((e :enumerators/init) eoe)] > > [[state (dec n)] x]) > > [[state 0] eoe])) > > :enumerators/step (fn [[state n] eoe] > > (if (pos? n) > > (let [[state x] ((e :enumerators/step) state eoe)] > > [[state (dec n)] x]) > > [[state 0] eoe])) > > :enumerators/clean (fn [[state _]] > > ((e :enumerators/clean) state)) > > and that multiplexing has been lost while trying to bring a "parallel > map". In fact, I lost good multiplexing potential at STEP 3: since then > ereduce apply the reducing function at each step, which implies that > each step must now return a new value while, before STEP 3, the only > purpose of a "step" was to advance the computation (think about filtered > enumerators: the computation can advance without calling the reducing > function). > I'm afraid that I reversed the inversion of control :-( and this is why > the last etake function was interesting: because it does not involve the > reducing function anymore (see STEP 0). > > > > STEP 5: back on track > > Now I define an enumerator as three functions: > > (def a-non-rec-enum > > {:create-state (fn [] ; returns a state value > > ......) > > :step (fn [self state f acc] ; returns a ccall or [result > final-state], it's the regular part of an enumerator as of STEP 2 > > ......) > > :destroy-state (fn [state] ; destroy the state (release resources) > > ......)}) > > The updated ereduce: > > (defn ereduce [f seed non-rec-enum] > > (let [ccall (fn self [& args] (cons self args)) > > ccall? #(and (seq? %) (= (first %) ccall)) > > state ((non-rec-enum :create-state)) > > nre-step (non-rec-enum :step)] > > (loop [args [ccall state f seed]] > > (let [r (apply nre-step args)] > > (if (ccall? r) > > (recur r) > > (let [[result state] r] > > ((non-rec-enum :destroy-state) state) > > result)))))) > > emap: (I'm pretty sure there's a clever/more functional way to write emap) > > (defn emap [f & nres] > > (let [ccall (fn self [& args] (cons self args)) > > ccall? #(and (seq? %) (= (first %) ccall)) > > nre-steps (map :step nres) > > marker (Object.)] > > {:create-state (fn [] (map > > (fn [nre] (ccall > > ((nre :create-state)) > > (fn [_ x] x) > > marker)) > > nres)) > > :step (fn [self state g acc] > > (if (every? ccall? state) > > (if (some #(identical? marker (last %)) state) > > (self (map #(if (identical? marker (last %2)) (apply %1 %2) %2) > nre-steps state) g acc) ; advance enumerators that haven't "produced" a > value yet > > (self (map (fn [step [self state f _]] (step self state f marker)) > nre-steps state) g (g acc (apply f (map last state))))) ; all > enumerators are ready: reduce and advance them all! > > [acc state])) > > :destroy-state (fn [state] > > (dorun (map #((%1 :destroy-state) (second %2)) nres state)))})); a > "parallel" doseq would be handy :-) > > helper function to turn a seqable into an enumerator: > > (defn enum-seq [c] > > {:create-state (fn [] (seq c)) > > :step (fn [self state f acc] > > (if-let [fst & rst] state > > (self rst f (f acc fst)) > > [acc nil])) > > :destroy-state (fn [state] )}) > > and another to turn an enumerator into a vector: > > (defn v [e] (ereduce conj [] e)) > > efilter: > > (defn efilter [pred {:keys [create-state step destroy-state]}] > > {:create-state create-state > > :step (fn [self state f acc] > > (step self state #(if (pred %2) (f %1 %2) %1) acc)) > > :destroy-state destroy-state}) > > a simple io-based enumerator constructor: > > (defn enum-lines [filename] > > {:create-state (fn [] (-> filename java.io.FileReader. > java.io.BufferedReader.)) > > :step (fn [self #^java.io.BufferedReader rdr f acc] > > (if-let line (.readLine rdr) > > (self rdr f (f acc line)) > > [acc rdr])) > > :destroy-state (fn [#^java.io.BufferedReader rdr] > > (.close rdr))}) > > Ok, let's test them: > > user=> (def e1 (enum-seq (range 30))) > > #'user/e1 > > user=> (def e2 (enum-seq "ABCDEFGHIJ")) > > #'user/e2 > > user=> (v (emap vector e1 e2)) > > [[0 \A] [1 \B] [2 \C] [3 \D] [4 \E] [5 \F] [6 \G] [7 \H] [8 \I] [9 \J]] > > user=> (v (efilter #(zero? (rem % 2)) e1)) > > [0 2 4 6 8 10 12 14 16 18 20 22 24 26 28] > > user=> (v (emap vector (efilter #(zero? (rem % 2)) e1) e2)) > > [[0 \A] [2 \B] [4 \C] [6 \D] [8 \E] [10 \F] [12 \G] [14 \H] [16 \I] > [18 \J]] > > user=> (v (enum-lines "/tmp/test-enums")) > > ["This is line one," "this is line two," "this is line three," "this > is line four," "this is line five."] > > user=> (v (efilter #(= 18 (count %)) (enum-lines "/tmp/test-enums"))) > > ["this is line four," "this is line five."] > > That's all for this post. I haven't adressed early termination yet but I > haven't even made my mind between the reducing function returning an > out-of-band value or throwing a canned Throwable (I don't consider the > option of having the reducing function to return [mustStop value] > because I'd like to be able to pass "normal" fuctions.) > > The idea would be something along these lines: > > (def enil? [e] > > (ereduce #(fn [x _] (if x false (stop-enumeration))) true)) > and not: > > (def enil? [e] > > (ereduce #(fn [_ _] (stop-and-return x)) true)) > > I think that one can write a function to transform a basic enumerator > (such as those returned by enum-seq or enum-lines) into an > early-termination-aware one. > About multiplexing, there are some details to sort out but I think that > enumerators returned by emap can be made multiplexable if the > mapped-over enumerators also are multiplexable. > > Christophe >
--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---
