Am Donnerstag 17 September 2009 21:07:28 schrieb Cristiano Paris: > On Tue, Sep 15, 2009 at 11:31 PM, Daniel Fischer > > <daniel.is.fisc...@web.de> wrote: > > ... > > Yeah, you do *not* want the whole file to be read here, except above for > > testing purposes. > > That's not true. Sometimes I want to, sometimes don't.
The "for the case of sorting by metadata" was tacitly assumed :) > But I want to use the same code for reading files and exploit laziness > to avoid reading the body. > > > Still, ByteStrings are probably the better choice (if you want the body > > and that can be large). > > That's not a problem by now. > > > To avoid reading the body without unsafePerformIO: > > > > readBit fn > > = Control.Exception.bracket (openFile fn ReadMode) hClose > > (\h -> do > > l <- hGetLine h > > let i = read l > > bdy <- hGetContents h > > return $ Bit i bdy) > > Same problem with the "withFile"-version: nothing gets printed if I > try to print out the body: that's way I used seq. Ah, yes. The file is closed too soon. > > I'm starting to think that the only way to do this without using > unsafePerformIO is to have the body being an IO action: simply, under > Haskell assumption, that's not possible to write, because Haskell > enforce safety above all. Well, what about readBit fn = do txt <- readFile fn let (l,_:bdy) = span (/= '\n') txt return $ Bit (read l) bdy ? With main = do args <- getArgs let n = case args of (a:_) -> read a _ -> 1000 bl <- mapM readBit ["file1.txt","file2.txt"] mapM_ (putStrLn . show . index) $ sortBy (comparing index) bl mapM_ (putStrLn . take 20 . drop n . body) bl ./cparis3 30 +RTS -sstderr 2 3 CCGGGCGCGGTGGCTCACGC CCGGGCGCGGTGGCTCACGC 408,320 bytes allocated in the heap 1,220 bytes copied during GC 34,440 bytes maximum residency (1 sample(s)) 31,096 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) ./cparis3 20000 +RTS -sstderr 2 3 AAAATTAGCCGGGCGTGGTG AAAATTAGCCGGGCGTGGTG 1,069,168 bytes allocated in the heap 105,700 bytes copied during GC 137,356 bytes maximum residency (1 sample(s)) 27,344 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) ./cparis3 2000000 +RTS -sstderr 2 3 CCTGGCCAACATGGTGAAAC CCTGGCCAACATGGTGAAAC 80,939,296 bytes allocated in the heap 8,925,240 bytes copied during GC 137,056 bytes maximum residency (2 sample(s)) 45,528 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) %GC time 38.5% (27.0% elapsed) Alloc rate 1,264,577,704 bytes per MUT second Productivity 61.5% of total user, 38.8% of total elapsed ./cparis3 20000000 +RTS -sstderr 2 3 CAGAGCGAGACTCCGTCTCA CAGAGCGAGACTCCGTCTCA 806,034,756 bytes allocated in the heap 76,775,944 bytes copied during GC 136,876 bytes maximum residency (2 sample(s)) 43,324 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 1536 collections, 0 parallel, 0.35s, 0.35s elapsed Generation 1: 2 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.53s ( 0.67s elapsed) GC time 0.35s ( 0.36s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.88s ( 1.02s elapsed) %GC time 40.0% (34.9% elapsed) Alloc rate 1,526,482,681 bytes per MUT second Productivity 60.0% of total user, 51.7% of total elapsed Seems to work as desired. > > Cristiano _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe