RE: Bug in IO libraries when sending data through a pipe?

Simon Marlow Tue, 19 Mar 2002 09:30:33 -0800

> * When we fork(), we lose sharing.  *Any* lazy computation which
>   passes to both children is going to penalize you, sometimes in very
>   surprising ways.


I'm not sure I understand why loss of sharing is the problem - losing
sharing for pure computations is by no means a disaster, it just means
that some work is duplicated.

Here's another way to look at the problem with hGetContents:

   - hGetContents returns a stream whose value may be affected
     by subsequent I/O operations

   - evaluating the stream returned by hGetContents may perform
     some I/O

if these two effects are connected to each other, as they are in the
fork() example, then evaluating one "pure" value changes the subsequent
value of another.  This is clearly not referentially transparent, it's
not just a loss of sharing.

It's not just fork() that suffers from this interaction - you can get
the same effect just using Posix.dupFd & Posix.fdToHandle (although it's
probably not possible using just standard Haskell 98).

> Thus, raw access to fork is guaranteed to be the wrong thing for
> nearly everybody all the time.  It's probably worth noting this
> prominently next to any and all documentation for fork, and next to
> its code.  Why?  Because use of fork is part of the commonly accepted
> idiom for running one program from within another.  It's likely
> programmers who've done this in other languages will go looking for
> "fork" rather than some nicer, higher-level functionality (POpen?)
> that has seqs in all the right places and actually does what they
> want.

Yes, I agree we should stick large red notices next to
Posix.forkProcess.

> That said, I'd love to have lazy I/O that actually works right, if
> only because it actually *does* do the right thing for the 95% of the
> programs which get written which *aren't* doing fancy I/O.  I say this
> having written programs which use lazy I/O to process files which are
> much larger than the total virtual memory on my machine (so mmap-ing
> regular files to snapshot their contents isn't going to be good enough
> for me, even if it works for smaller files).  
> 
> It seems to me part of the problem is that lazy I/O results in
> concurrency, and concurrency is hard.  This is particularly true as
> the lazy I/O routines don't say "WARNING!  CONCURRENCY" all over the
> place.  Does this mean we should make semi-closed handles untouchable?
> Should there be rules that turn "lines . getContents" into something
> vaguely sensible and non-lazy?  Should we say something sensible about
> the behavior of anything concurrent-ish across fork?  Simon, what
> would it take to make you stop worrying and love lazy I/O? :-)

If there was a form of lazy I/O which didn't require the programmer to
reason about the evaluatedness of the lazy stream, that would remove
most of my complaints.  I agree that lazy I/O results in concurrency -
but it's a particularly intractable form of concurrency because it
involves interaction between the IO and pure parts of the program.
Concurrent Haskell is "easy" by comparison: everything's in the IO
monad.

One example that crops up regularly is trying to lazilly read a large
number of files - if you're not careful, you run out of file
descriptors.  How do you avoid running out of file descriptors?  Well,
you make sure the lazy stream is fully evaluated.  That requires adding
seq or otherwise reasoning about whether we've evaluated the stream to
the end.  If you're going to use seq, then that defeats the purpose of
lazy I/O in the first place, and reasoning about the evaluatedness of
values is made hard by Haskell's underspecification of the evaulation
order.

IMHO if you find you need to worry about these things in your program,
then you should switch to non-lazy I/O.  Let's keep IO in the IO monad!

Cheers,
        Simon
_______________________________________________
Glasgow-haskell-bugs mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

RE: Bug in IO libraries when sending data through a pipe?

Reply via email to