On Tue, Aug 2, 2011 at 4:47 PM, David Nolen <dnolen.li...@gmail.com> wrote:
> On Tue, Aug 2, 2011 at 4:43 PM, Trastabuga <lisper...@gmail.com> wrote:
>>
>> I just came across the issue of getting the StackOverflowError in the
>> function reading long file and recursively building a list of data.
>> After I replaced function name with "recur" the problem went away.
>> Hence a couple of questions. It's the programmer's responsibility to
>> put recur instead of function names where tail-recursion is available?
>> (Kind of obvious, but then why the compiler wouldn't detect it?)
>> If tail-recursion is not available and function name has to be called,
>> how do I deal with stack overflow in this case?
>
> Clojure does not have tail call optimization. You must use recur,
> trampoline, or redesign your code around lazy sequences.
> David

More specifically, you must use recur when your call chain is:

a -> a -> a ...

You'll need trampoline for more elaborate cases of mutually recursive
functions, like

a -> b -> c -> a

or even

a -> b -> a -> c -> d -> b -> a

(where clearly a at least has conditional branches that can lead to
tail calls to at least either b or c depending).

Lazy sequences provide another way to do this. For example, suppose
you have a set of functions a, b, c ... and a state machine specified
with a map like {\a {\a b \b d \c q ...} \b {\a c \b a \c d ...} ...}
(as might be used if for some reason you wanted your own homegrown
regular expression implementation), you might implement this with
trampoline using

(defn a [in out]
  (if in
    (let [new-state ((state-map (first in)) \a)]
      #(new-state (next in) (conj out whatever)))
   (list out)))

...

(defn process [in]
  (apply str (first (trampoline a (seq in) []))))

or something like that. You might also use lazy-seq:

(defn a [in]
  (lazy-seq
    (if in
      (let [new-state ((state-map (first in)) \a)]
        (cons whatever (new-state (next in)))))))

...

(defn process [in] (apply str (a in)))

though you might also use the higher-level reduce:

(def state-to-letter {a \a b \b ...})

(defn a [ch]
  whatever)

...

(defn process [in]
  (apply str
    (second
      (reduce
        (fn [[[current-state out] ch]
          (let [new-state ((state-map ch)
                          (state-to-letter current-state))]
            [new-state (conj out (current-state ch))]))
        [a []]
        in))))

or a low-level loop/recur:

(defn process [in]
  (loop [in (seq in) out [] current-state a]
    (if in
      (let [ch (first in)
            new-state ((state-map ch)
                        (state-to-letter current-state))]
        (recur (next in) (conj out (current-state ch)) new-state))
      (apply str out))))

(Note: untested, and the "whatever"s are presumed to depend on the
state and the current input character, in such a way as to be hard to
replace with just a map lookup.)

Note that all of the above find some way to turn a calls b calls a
calls c calls ... into something sequentially calls a, then b on a's
return value, then a on that value, then c on that value ... and thus
avoids deeply nested stack frames.

In the case of trampoline, trampoline has the behavior that if the
function it's passed returns a new function, it calls that (with no
arguments), and repeats until a non-function return value is
generated. (In the state machine this has the awkward effect of
requiring the vector a returns if it's reached the end of in to be
wrapped in a list, since vectors are callable as functions and
trampoline might therefore call it instead of return it.) Since
functions can be selected programmatically in a functional language,
the state machine can be almost directly encoded in functions, or less
directly using a transition map as above.

Lazy-seq is good for cases like the above where each step is appending
to some growing output sequence. Then you just write each step almost
as you would with plain recursion, returning (cons this-steps-result
(next-step args)), but the whole thing is wrapped in (lazy-seq ...).
The seq is finite if and only if at some point one of the steps
returns nil instead of a cons. In this case that happens when the
input is exhausted. The seq generated is (step-1s-result
step-2s-result step-3s-result ...) and since it's lazy steps are only
performed as the consumer needs more items, and are called
sequentially, rather than nesting. (Technically, each step ends up
returning a delay rather than the cons, and the delay gets forced when
the step's output is needed by the consumer, which triggers the next
step, which produces another delay for the consumer to unwrap later;
so each step returns a delay instead of directly calling the next, and
the caller calls the next by forcing the delay, which turns the chain
of tail calls again into iteration.)

Reduce is generally useful for an iteration that accumulates a result,
in this case a vector. Direct use of loop/recur is also useful
sometimes for iteration.

Note that the trampoline, reduce, and loop/recur versions are non-lazy
-- the output is generated all at once, though in constant stack
space, and returned as a complete object; whereas the lazy-seq version
is lazy. The lazy-seq version can be emulated at a bit higher level by
using the iterate function; this is left as an exercise for the
reader. :) Hint: it's actually easier to convert the reduce version
than the lazy-seq version.

-- 
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to