Here is a program:
import System
import Monad
import Char
import Directory
main = do
let katalog = "/var/spool/news/articles/pl/rec/hihot"
nazwy <- getDirectoryContents katalog
let posty = map (\s -> katalog++'/':s) $ filter (all isDigit) nazwy
r <- liftM concat $ mapM readFile $ posty
putStr r
exitWith ExitSuccess
When compiled by ghc-4.02 under Linux (kernel 2.2.10, glibc 2.1),
it gets SIGSEGV. The directory /var/spool/news/articles/pl/rec/hihot
contains over 3000 files. The program opens 576 of them and dies. The
same is with other large directories, it only dies after opening a
slightly different number of files, which also differ after minor
modifications to the program.
It's interesting that when I change one definition to:
let posty = replicate 1000 "/dev/null"
-- or another constant filename, even
-- "/var/spool/news/articles/pl/rec/hihot/222000"
it works fine, and with more than 1024 files it simply fails with
`Reason: process file table full'. So I can't find a good testcase,
which would be easily reproducible by everyone.
The first question is how to prevent Haskell from opening all the
files before reading them? The following prevents overflowing the
file descriptor table and prevents the SIGSEGV:
readFile' filename = do
h <- openFile filename ReadMode
contents <- hGetContents h
seq contents (return ())
hClose h
return contents
but it reads all the files into memory before processing them. How
to write readFile' so it opens files lazily and closes the files that
have been consumed? I can't see a nice solution to the problem other
than trying to do the whole thing imperatively...
Besides this, there must be a bug in ghc. It shouldn't SIGSEGV.
BTW. Has the problem with SIGSEGV on fflush been solved? And at what
time can I expect the next version of ghc? I also couldn't compile the
ghc sources, but I've seen the same problem reported by someone else.
--
__("< Marcin Kowalczyk * [EMAIL PROTECTED] http://kki.net.pl/qrczak/
\__/ GCS/M d- s+:-- a22 C+++>+++$ UL++>++++$ P+++ L++>++++$ E-
^^ W++ N+++ o? K? w(---) O? M- V? PS-- PE++ Y? PGP->+ t
QRCZAK 5? X- R tv-- b+>++ DI D- G+ e>++++ h! r--%>++ y-
Problems with reading many files
Marcin 'Qrczak' Kowalczyk Wed, 30 Jun 1999 18:56:39 +0200 (MET DST)
- Re: Problems with reading many files Marcin 'Qrczak' Kowalczyk
- Re: Problems with reading many files Marcin 'Qrczak' Kowalczyk
- RE: Problems with reading many files Sigbjorn Finne (Intl Vendor)
