Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-21 Thread Claus Reinke

[trigger garbage collection when open runs out of free file descriptors, then 
try again]

so, instead of documenting limitations and workarounds, this issue should be
fixed in GHC as well.


This may help in some cases but it cannot be relied upon. Finalizers are
always run in a separate thread (must be, see
http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if
you force a GC when handles are exhausted, as hugs seems to do, there is no
guarantee that by the time the GC is done the finalizers have freed any
handles (assuming that the GC run really detects any handles to be
garbage).


useful reference to collect!-) but even that mentions giving back os resources 
such
as file descriptors as one of the simpler cases. running the GC/finalizers 
sequence
repeatedly until nothing more changes might be worth thinking about, as are 
possible
race conditions. here is the thread the paper is refering to as one of its 
origins:

   http://gcc.gnu.org/ml/java/2001-12/msg00113.html
   http://gcc.gnu.org/ml/java/2001-12/msg00390.html

i also like the idea mentioned as one of the alternatives in 3.1, where the 
finalizer does
not notify the object that is to become garbage, but a different manager 
object. in this
case, one might notify the i/o handler, and that could take care of avoiding 
trouble.

in my opinion, if my code or my finalizers hold on to resources i'd like to see 
freed,
then i'm responsible, even if i might need language help to remedy the 
situation.
but if i take care to avoid such references, and the system still runs out of 
resources
just because it can't be bothered to check right now whether it has some left 
to free,
there is nothing i can do about it (apart from complaining, that is!-).

of course, this isn't new. see, for instance, this thread view:
http://groups.google.com/group/fa.haskell/browse_thread/thread/2f1f855c8ba33a5/74d32070dbcc92fc?lnk=stq=hugs+openFile+file+descriptor+garbage+collectionrnum=1#74d32070dbcc92fc

where Remi Turk points out System.Mem.performGC, and Simon Marlow
agrees that GHC should do more to free file descriptors, but also mentions that
performGC doesn't run finalizers.

actually, if i have readFile-based code that immediately processes the file 
contents
before the next readFile, as in Matthew's test code, my ghci (on windows) 
doesn't
seem to run out of file descriptors easily, but if i force a descriptor leak by 
leaving
unreferenced contents unprocessed, then performGC does seem to help (not that
this is ideal in general, as discussed in the thread above):

   import System.Environment
   import System.Mem
   import System.IO

   main = do
 n:f:_ - getArgs
 (sequence (repeat (openFile f ReadMode))  return ()) `catch` (\_-return 
())
 test1 (take (read n) $ repeat f)

   test1 files = mapM_ doStuff files where
 doStuff f = {- performGC  -} readFile f = print.map length.take 
10.lines

interestingly, if i do that, even Hugs seems to need the performGC?

claus

ps. one could even try to go further, and have virtual file descriptors, like 
virtual
   memory. but that is something for the os, i guess.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-21 Thread Benjamin Franksen
Benjamin Franksen wrote:
 Bertram Felgenhauer wrote:
 Having to rely on GC to close the fds quickly enough is another problem;
 can this be solved on the library side, maybe by performing GCs when
 running out of FDs?
 
 Claus Reinke wrote:
 in good old Hugs, for instance, we find in function newHandle in
 src/iomonad.c
 [...snip...]
     /* Search for unused handle*/
     /* If at first we don't    */
     /* succeed, garbage collect*/
     /* and try again ...       */
     /* ... before we give up   */
 
 so, instead of documenting limitations and workarounds, this issue should
 be
 fixed in GHC as well.
 
 This may help in some cases but it cannot be relied upon. Finalizers are
 always run in a separate thread (must be, see
 http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if
 you force a GC when handles are exhausted, as hugs seems to do, there is
no
 guarantee that by the time the GC is done the finalizers have freed any
 handles (assuming that the GC run really detects any handles to be
 garbage).

Sorry for replying to myself, but I just realized that the argument brought
forth by Boehm applies only to general purpose finalizing facilites, and
not necessarily to each and every special case. I think one could make up
an argument that file handles in Haskell are indeed a special kind of
object and that the language runtime /can/ run finalizers for file handles
in a more 'synchronous' way (i.e. GC could call them directly as soon as it
determines they are garbage). The main point here is that a file descriptor
does not contain references to other language objects.

The same would apply to all sorts of OS resource handles. However, the whole
argument is a priori valid only for raw system handles, such as file
descriptors. No idea what issues come up if one considers e.g. buffering,
or more generally, any additional data structure that gets associated with
the handle.

Cheers
Ben

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-20 Thread Ferenc Wagner
Matthew Brecknell [EMAIL PROTECTED] writes:

 enumLines :: (a - String - Either a a) - a - FilePath - IO a
 enumLines iter accum filename = do
   h - openFile filename ReadMode
   flip fix accum $
 \iterate accum - do
   try_line - try (hGetLine h)
   case try_line of
 Left e - hClose h  return accum
 Right line - do
   case iter accum line of
 Left accum - hClose h  return accum
 Right accum - iterate accum

Another variation, enabling multiple iteratees (like a state machine),
exception propagation, and no flip fix :)

 newtype Iterator a = Iterator (a - String - (a,Maybe (Iterator a)))
 
 enumLines :: Iterator a - a - FilePath - IO (a,Maybe Exception)
 enumLines iterator start filename = do
   h - openFile filename ReadMode
   let f (Iterator iter) accum = do
 try_line - try (hGetLine h)
 case try_line of
   Left e - hClose h  return (accum,Just e)
   Right line - do
 case iter accum line of
   (acc',Nothing)   - hClose h  return (acc',Nothing)
   (acc',Just cont) - f cont acc'
   f iterator start
-- 
Feri.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-20 Thread Benjamin Franksen
Bertram Felgenhauer wrote:
 Having to rely on GC to close the fds quickly enough is another problem;
 can this be solved on the library side, maybe by performing GCs when
 running out of FDs?

Claus Reinke wrote:
 in good old Hugs, for instance, we find in function newHandle in
src/iomonad.c
 [...snip...]
     /* Search for unused handle*/
     /* If at first we don't    */
     /* succeed, garbage collect*/
     /* and try again ...       */
     /* ... before we give up   */
 
 so, instead of documenting limitations and workarounds, this issue should
be
 fixed in GHC as well.

This may help in some cases but it cannot be relied upon. Finalizers are
always run in a separate thread (must be, see
http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if
you force a GC when handles are exhausted, as hugs seems to do, there is no
guarantee that by the time the GC is done the finalizers have freed any
handles (assuming that the GC run really detects any handles to be
garbage).

Cheers
Ben

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Pete Kazmier
Matthew Brecknell [EMAIL PROTECTED] writes:

 Pete Kazmier:
 I attempted to read Oleg's fold-stream implementation [1] as this
 sounds quite appealing to me, but I was completely overwhelmed,
 especially with all of the various type signatures used.  It would be
 great if one of the regular Haskell bloggers (Tom Moertel are you
 reading this?) might write a blog entry or two interpreting his
 implementation for those of us starting out in Haskell perhaps by
 starting out with a non-polymorphic version so as to emphasize the
 approach.
 
 [1] http://okmij.org/ftp/Haskell/fold-stream.lhs

 The basic idea of the paper is the use of a left-fold operator as the
 primary interface for enumarating collections. The recursive version
 (less general than the non-recursive version) of a left-fold operator
 for enumerating the lines of a text file might look something like this:

 import Control.Monad.Fix
 import Control.Exception
 import Data.List
 import qualified Data.Set as S
 import qualified Data.Map as M
 import System.IO
 
 enumLines :: (a - String - Either a a) - a - FilePath - IO a
 enumLines iter accum filename = do
   h - openFile filename ReadMode
   flip fix accum $
 \iterate accum - do
   try_line - try (hGetLine h)
   case try_line of
 Left e - hClose h  return accum
 Right line - do
   case iter accum line of
 Left accum - hClose h  return accum
 Right accum - iterate accum

I understand the intent of this code, but I am having a hard time
understanding the implementation, specifically the combination of
'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
one can call 'flip' on a function that takes one argument.

 To use this, you provide an iteratee, a function which takes an
 accumulator and a line from the file, and returns a new accumulator
 embedded in an Either. Using the Left branch causes immediate
 termination of the enumeration. For example, to search for the first
 occurrence of each of a set of email headers:

 getHeaders :: S.Set String - FilePath - IO (S.Set String, M.Map String 
 String)
 getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where
   findHdrs accum@(wanted,found) line =
 if null line
   then Left accum
   else
 case headerLine line of
   Nothing - Right accum
   Just hdr -
 case findDelete hdr wanted of
   Nothing - Right accum
   Just wanted -
 let accum = (wanted, M.insert hdr line found) in
   if S.null wanted
 then Left accum
 else Right accum
 
 headerLine :: String - Maybe String
 headerLine (':':xs) = Just []
 headerLine (x:xs) = fmap (x:) (headerLine xs)
 headerLine [] = Nothing
 
 findDelete :: Ord a = a - S.Set a - Maybe (S.Set a)
 findDelete e s = if S.member e s
   then Just (S.delete e s)
   else Nothing

 It's a bit of a case-analysis nightmare, but when comparing this to
 previous approaches, note that file traversal and processing are cleanly
 separated, file handle closure is guaranteed to be timely, file
 traversal stops as soon as all the required headers have been found,
 memory usage is minimised.

Very nice.  I like the clean separation, but as you say, its one ugly
bit of code compared to my original code, although much more elegant
no doubt.

 I hope that helps.

Very much so.  Thank you for you help.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Pete Kazmier
Pete Kazmier [EMAIL PROTECTED] writes:

 I attempted to read Oleg's fold-stream implementation [1] as this
 sounds quite appealing to me, but I was completely overwhelmed,
 especially with all of the various type signatures used.  It would be
 great if one of the regular Haskell bloggers (Tom Moertel are you
 reading this?) might write a blog entry or two interpreting his
 implementation for those of us starting out in Haskell perhaps by
 starting out with a non-polymorphic version so as to emphasize the
 approach.

 [1] http://okmij.org/ftp/Haskell/fold-stream.lhs

In the event any other Haskell newbie comes along someday and is just
as overwhelmed as I was, I've found this post by Oleg to be a much
easier to understand than the above paper because it is not as generic
and thus the type signatures are a bit easier on the eyes:

http://www.haskell.org/pipermail/haskell/2003-September/012741.html

With that said, I have a question regarding Hal's response to the
above email in which he states:

 Just thought I'd mention that this is, in fact, my preferred method of
 iterating over a file.  It alleviates the pain associated with lazy file
 IO, and simultaneously provides a useful abstraction.  I actually have
 3*2 functions that I use which look like:
 
  type Iteratee  iter seed = seed - iter - Either seed seed
  hFoldChars  :: FilePath - Iteratee  Char seed - seed - IO seed
  hFoldLines  :: FilePath - Iteratee  String   seed - seed - IO seed
  hFoldWords  :: FilePath - Iteratee  [String] seed - seed - IO seed
 
  type IterateeM iter seed = seed - iter - IO (Either seed seed)
  hFoldCharsM :: FilePath - IterateeM Char seed - seed - IO seed
  hFoldLinesM :: FilePath - IterateeM String   seed - seed - IO seed
  hFoldWordsM :: FilePath - IterateeM [String] seed - seed - IO seed
 
 Which perform as expected (hFoldWords(M) can be written in terms of
 hFoldLinesM, but I find I use it sufficiently frequently to warrent
 having it stand out).  Also, of course, the only ones actually
 implemented are the (M) variants; the non-M variants just throw a return
 into the Iteratee.

What does he mean by the very last sentence?  Oleg's version seems
more like the non-M versions.  What is his implication?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Bryan O'Sullivan

Pete Kazmier wrote:


I understand the intent of this code, but I am having a hard time
understanding the implementation, specifically the combination of
'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
one can call 'flip' on a function that takes one argument.


If you look at the code, that's not really what's happening.  See the 
embedded anonymous function below?


  flip fix accum $
 \iterate accum - do
   ...

It's a function of two arguments.  All flip is doing is switching the 
order of the arguments to fix, in this case for readability.  If you 
were to get rid of the flip, you'd need to remove the accum after 
fix and move it after the lambda expression, which would make the 
expression much uglier to write and read.  So all the flip is doing 
here is tidying up the code.


(If you're still confused, look at the difference between forM and mapM. 
 The only reason forM exists is readability when you have - in terms of 
the amount of screen space they consume - a big function and a small 
piece of data, just as here.)


As to why it's okay to call flip on fix at all, look at the types 
involved.


fix :: (a - a) - a
flip :: (a - b - c) - b - a - c

By substitution:

flip fix :: a - ((a - b) - a - b) - b

In the case above, accum has type a, and the lambda has type
(a - IO a) - a - IO a, and these fit nicely into the type expected by 
flip fix.


b
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Lennart Augustsson

Here's what happens:
fix has type (x-x)-x
and that has to match the first argument to flip, namely 'a-b-c'.
The only chance of that is if x is actually a function type.
Pick x=b-c, now we have
fix has type ((b-c)-b-c)-b-c
and it matches a-b-c if a=(b-c)-b-c

Flip returns b-a-c, and if we substitute we get
b-((b-c)-b-c)-c

If you rename the variables you get the suggested type.

-- Lennart


On Mar 19, 2007, at 20:35 , Pete Kazmier wrote:


Bryan O'Sullivan [EMAIL PROTECTED] writes:


Pete Kazmier wrote:


I understand the intent of this code, but I am having a hard time
understanding the implementation, specifically the combination of
'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
one can call 'flip' on a function that takes one argument.



As to why it's okay to call flip on fix at all, look at the types
involved.

fix :: (a - a) - a
flip :: (a - b - c) - b - a - c

By substitution:

flip fix :: a - ((a - b) - a - b) - b


Sadly, I'm still confused.  I understand how 'flip' works in the case
where its argument is a function that takes two arguments.  I've
started to use this in my own code lately.  But my brain refuses to
understand how 'flip' is applied to 'fix', a function that takes one
argument only, which happens to be a function itself.  What is 'flip'
flipping when the function passed to it only takes one argument?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Isaac Dupree
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pete Kazmier wrote:
 Bryan O'Sullivan [EMAIL PROTECTED] writes:
 
 Pete Kazmier wrote:

 I understand the intent of this code, but I am having a hard time
 understanding the implementation, specifically the combination of
 'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
 one can call 'flip' on a function that takes one argument.
 
 As to why it's okay to call flip on fix at all, look at the types
 involved.

 fix :: (a - a) - a
 flip :: (a - b - c) - b - a - c

 By substitution:

 flip fix :: a - ((a - b) - a - b) - b
 
 Sadly, I'm still confused.  I understand how 'flip' works in the case
 where its argument is a function that takes two arguments.  I've
 started to use this in my own code lately.  But my brain refuses to
 understand how 'flip' is applied to 'fix', a function that takes one
 argument only, which happens to be a function itself.  What is 'flip'
 flipping when the function passed to it only takes one argument?

fix :: (a - a) - a
In this case, we know something about 'a': it is a function (b - c).
Substitute:
fix :: ((b - c) - (b - c)) - (b - c)
Take advantage of the right-associativity of (-)
fix :: ((b - c) - b - c) - b - c
Now it looks like a function of two arguments, because the return value
(normally ordinary data) can in fact, in this case, take arguments.

Here's another example of that:

data Box a = Box a
get (Box a) = a
- -- get (Box 1) :: Int
- -- get (Box (\a - a)) :: Int - Int
- -- (get (Box (\a - a))) 1 :: Int
 --function application is left-associative:
- -- get (Box (\a - a)) 1 :: Int
- -- flip get 1 (Box (\a - a)) :: Int

Yes, it sometimes confuses me too.

Isaac
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF/vcXHgcxvIWYTTURAj5RAKCUMeAF0vosJ6ROAVlBIDHsEq/vzgCfflnR
50BmW6tuAF6mKXBtrlHdQ5Y=
=uv3G
-END PGP SIGNATURE-
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-17 Thread Pete Kazmier
Matthew Brecknell [EMAIL PROTECTED] writes:

 So here's a test. I don't have any big maildirs handy, so this is based
 on the simple exercise of printing the first line of each of a large
 number of files. First, the preamble.

 import Control.Exception (bracket)
 import System.Environment
 import System.IO

 main = do
   t:n:fs - getArgs
   ([test0,test1,test2,test3] !! read t) (take (read n) $ cycle fs)
 
 [snip]

Thank you for summarizing the approaches presented by others.  As a
Haskell newbie, there seems to be quite a few esoteric concepts to
conquer.  Your concrete examples were helpful in my understanding of
the ramifications associated with the various approaches.

After reading the various threads you cited, I decided to avoid lazy
IO altogether.  By using 'readFile' without forcing the strict
evaluation of my parser, I inadvertently relinquished control of the
resource management--closing of the file handles was left to the GC.
And although I could have used 'seq' to address the issue, why bother
fixing a problem that could have been avoided altogther by using
strict IO.

With that said, I added the following function to my program and then
replaced the invocation of 'readFile' with it:

  readEmailHeaders :: FilePath - IO String
  readEmailHeaders file = 
  bracket (openFile file ReadMode) (hClose) (headers [])
  where
headers acc h = do
line - hGetLine h
case line of
  -- Stop reading file once we hit the empty separator
  -- line, no need to read the rest of the file (body).
   - return . concat . reverse $ acc
  _  - headers (\n:line:acc) h

I'm not sure if this is the best implementation, but the speed is
comparable to the lazy IO version without the annoying defect of
running out of file handles.  I also tried an implementation using
'hGetChar' but that was much slower.

I attempted to read Oleg's fold-stream implementation [1] as this
sounds quite appealing to me, but I was completely overwhelmed,
especially with all of the various type signatures used.  It would be
great if one of the regular Haskell bloggers (Tom Moertel are you
reading this?) might write a blog entry or two interpreting his
implementation for those of us starting out in Haskell perhaps by
starting out with a non-polymorphic version so as to emphasize the
approach.

Thanks,
Pete

[1] http://okmij.org/ftp/Haskell/fold-stream.lhs

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-14 Thread Pete Kazmier
[EMAIL PROTECTED] (Donald Bruce Stewart) writes:

 pete-expires-20070513:
 When using readFile to process a large number of files, I am exceeding
 the resource limits for the maximum number of open file descriptors on
 my system.  How can I enhance my program to deal with this situation
 without making significant changes?

 Read in data strictly, and there are two obvious ways to do that:

 -- Via strings:

 readFileStrict f = do
 s - readFile f
 length s `seq` return s

 -- Via ByteStrings
 readFileStrict  = Data.ByteString.readFile
 readFileStrictString  = liftM Data.ByteString.unpack 
 Data.ByteString.readFile

 If you're reading more than say, 100k of data, I'd use strict
 ByteStrings without hesitation. More than 10M, and I'd use lazy
 bytestrings.

Correct me if I'm wrong, but isn't this exactly what I wanted to
avoid?  Reading the entire file into memory?  In my previous email, I
was trying to state that I wanted to lazily read the file because some
of the files are quite large and there is no reason to read beyond the
small set of headers.  If I read the entire file into memory, this
design goal is no longer met.

Nevertheless, I was benchmarking with ByteStrings (both lazy and
strict), and in both cases, the ByteString versions of readFile yield
the same error regarding max open files.  Incidentally, the lazy
bytestring version of my program was by far the fastest and used the
least amount of memory, but it still crapped out regarding max open
files. 

So I'm back to square one.  Any other ideas?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-14 Thread Donald Bruce Stewart
pete-expires-20070513:
 [EMAIL PROTECTED] (Donald Bruce Stewart) writes:
 
  pete-expires-20070513:
  When using readFile to process a large number of files, I am exceeding
  the resource limits for the maximum number of open file descriptors on
  my system.  How can I enhance my program to deal with this situation
  without making significant changes?
 
  Read in data strictly, and there are two obvious ways to do that:
 
  -- Via strings:
 
  readFileStrict f = do
  s - readFile f
  length s `seq` return s
 
  -- Via ByteStrings
  readFileStrict  = Data.ByteString.readFile
  readFileStrictString  = liftM Data.ByteString.unpack 
  Data.ByteString.readFile
 
  If you're reading more than say, 100k of data, I'd use strict
  ByteStrings without hesitation. More than 10M, and I'd use lazy
  bytestrings.
 
 Correct me if I'm wrong, but isn't this exactly what I wanted to
 avoid?  Reading the entire file into memory?  In my previous email, I
 was trying to state that I wanted to lazily read the file because some
 of the files are quite large and there is no reason to read beyond the
 small set of headers.  If I read the entire file into memory, this
 design goal is no longer met.
 
 Nevertheless, I was benchmarking with ByteStrings (both lazy and
 strict), and in both cases, the ByteString versions of readFile yield
 the same error regarding max open files.  Incidentally, the lazy
 bytestring version of my program was by far the fastest and used the
 least amount of memory, but it still crapped out regarding max open
 files. 
 
 So I'm back to square one.  Any other ideas?

Hmm. Ok. So we need to have more hClose's happen somehow. Can you
process files one at a time?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-14 Thread Dougal Stanton
Quoth Pete Kazmier, nevermore,
 the same error regarding max open files.  Incidentally, the lazy
 bytestring version of my program was by far the fastest and used the
 least amount of memory, but it still crapped out regarding max open
 files. 

I've tried the approach you appear to be using and it can be tricky
to predict how the laziness will interact with the list of actions.

For example, I tried to download a temporary file, read a bit of data
out of it and then download another one. I thought I would save thinking
and use the same file name for each download: /tmp/feed.xml. What
happened was that it downloaded them all in rapid succession,
over-writing each one with the next and not actually reading the data
until the end. So I ended up parsing N identical copies of the final
file, instead of one of each.

You need to refactor how you map the functions so that fewer whole lists
are passed around. I'd guess that (1) is being executed in its entirety
before being passed to (2), but it's not until (2) that the file data is
actually used.

 main =
 getArgs  =
 mapM fileContentsOfDirectory = -- (1)
 mapM_ print . threadEmails . map parseEmail . concat -- (2)

This means there are a lot of files sitting open doing nothing. I've had
a lot of success by recreating this as:

 main = 
  getArgs =
  mapM_ readAndPrint
   where readAndPrint = fileContentsOfDirectory = print -- etc.

It may seem semantically identical but it sometimes makes a difference
when things actually happen.

-- 
Dougal Stanton
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe