On 2008.04.28 09:27:27 -0700, David Roundy <[EMAIL PROTECTED]> scribbled 1.7K characters: > On Mon, Apr 28, 2008 at 8:39 AM, Gwern Branwen <[EMAIL PROTECTED]> wrote: > > > I'm not sure what you mean by co_slurpy being strinct. It looks to me > > like > > > it's got adequate unsafeInterleaveIO to make it lazy. > > > -- > > > David Roundy > > > > Well, it does have plenty of unsafeInterleaveIO, that is true, but the > > issue here is readFilePS: readFilePS is completely strict, it reads the > > entire file into memory (per docs and implementation). So, actually running > > readFilePS may get delayed to the last second, but once readFilePS gets > > inspected, it'll immediately do its best to suck in all 9 gigs or whatever. > > > > This is why replacing readFilePS in co_slurp_helper with mmapFilePS is > > such a time saver - it is lazy and pretends to read in all 9 gigs > > immediately, but since with -s, we ultimately only read the first 4096 > > characters, only a little bit will ever actually get page-faulted into > > memory. > > > > (The problem with mmapFilePS is that as lispy mentions, on my 64-bit > > system, mmapFilePS can no longer handle >3 gig files while readFilePS > > scaled up to at least 9gigs, albeit slowly.) > > The other problem is that mmapFilePS will cause darcs to fail entirely > on large repositories (with more than 1k files) due to sucking up all > the system's file handles. I think this is a more common use case in > darcs than 9g files. Of course, we could refuse to mmap small files > (we already do this for very small files), and that could alleviate > the problem considerably.
(Just a side note; with Lispy's type sig changes, I can now handle >3 gig files
just fine, albeit more slowly than with readFilePS.)
Hm. I'm not sure about that. Perhaps you mean it'll fail on 32-bit systems? It
works for me:
[EMAIL PROTECTED]:2849~/foo>echo "make sure we're using lispy's mmap version"
&& duh bigtempfile [ 6:04PM]
make sure we're using lispy's mmap version
3.9G bigtempfile
3.9G total
[EMAIL PROTECTED]:2850~/foo>cd ~/bin/ghc && darcs query manifest | wc [ 6:05PM]
aclocal.m4 compat/ configure.ac distrib/ ghc.spec.in
install-sh LICENSE quickcheck/ validate
ANNOUNCE compiler/ _darcs/ docs/ gmp/
InstallShield/ Makefile README WindowsInstaller/
bindisttest/ config.guess darcs-all driver/ HACKING libffi/
mk/ rts/
boot config.sub darcs.prof extra-gcc-opts.in includes/
libraries/ push-all utils/
1191 1234 33726
[EMAIL PROTECTED]:2851~/bin/ghc>echo "ok, so there's 1200 files here. Let's see
whether whatsnew -s fails due to filehandles" && darcs whatsnew -s
ok, so there's 1200 files here. Let's see whether whatsnew -s fails due to
filehandles
No changes!
[EMAIL PROTECTED]:2847~/bin/ghc>echo "maybe the problem was masked by the lack
of changes?" && rm HACKING ANNOUNCE LICENSE README [ 6:07PM]
maybe the problem was masked by the lack of changes?
[EMAIL PROTECTED]:2848~/bin/ghc>whatsnew -s [ 6:07PM]
R ./ANNOUNCE
R ./HACKING
R ./LICENSE
R ./README
> Another problem is that using mmap on files in the working directory
> can lead to segfaults, since the user is allowed to edit files in the
> working directory while darcs runs--or at least I don't want to
> segfault if the user does this.
>
> David
Hm, that does sound bad. Is there no way to handle this (set read-only, catch
exceptions, etc)? I'll admit I've never tried to edit files while using Darcs,
but that's just me.
--
gwern
Kilo remailers BOSS Medco mass CIDA Fetish bullion USCODE spies
pgpcKlpKzVQPA.pgp
Description: PGP signature
_______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
