dagit: > On Thu, Sep 18, 2008 at 12:31 PM, Creighton Hogg <[EMAIL PROTECTED]> wrote: > > On Thu, Sep 18, 2008 at 1:55 PM, Don Stewart <[EMAIL PROTECTED]> wrote: > >> wchogg: > >>> On Thu, Sep 18, 2008 at 1:29 PM, Don Stewart <[EMAIL PROTECTED]> wrote: > > <snip> > >>> > This makes me cry. > >>> > > >>> > import System.Environment > >>> > import qualified Data.ByteString.Lazy.Char8 as B > >>> > > >>> > main = do > >>> > [f] <- getArgs > >>> > s <- B.readFile f > >>> > print (B.count '\n' s) > >>> > > >>> > Compile it. > >>> > > >>> > $ ghc -O2 --make A.hs > >>> > > >>> > $ time ./A /usr/share/dict/words > >>> > 52848 > >>> > ./A /usr/share/dict/words 0.00s user 0.00s system 93% cpu 0.007 total > >>> > > >>> > Against standard tools: > >>> > > >>> > $ time wc -l /usr/share/dict/words > >>> > 52848 /usr/share/dict/words > >>> > wc -l /usr/share/dict/words 0.01s user 0.00s system 88% cpu 0.008 > >>> > total > >>> > >>> So both you & Bryan do essentially the same thing and of course both > >>> versions are far better than mine. So the purpose of using the Lazy > >>> version of ByteString was so that the file is only incrementally > >>> loaded by readFile as count is processing? > >> > >> Yep, that's right > >> > >> The streaming nature is implicit in the lazy bytestring. It's kind of > >> the dual of explicit chunkwise control -- chunk processing reified into > >> the data structure. > > > > To ask an overly general question, if lazy bytestring makes a nice > > provider for incremental processing are there reasons to _not_ reach > > for that as my default when processing large files? > > Yes. The main time is when you "accidentally" force the whole file > (or at least large parts of it) into memory at the same time. > Profiling and careful programming seem to be the workarounds, but in a > large application the "careful programming" part can become > prohibitively expensive. This is due to the sometimes subtle nature > of how strictness composes with laziness. This is a the result of a > more general issue that it is non-obvious how your program is > evaluated at run-time thanks to lazy evaluation, thus making lazy > evaluation act as a double edged sword at times. I'm not saying get > rid of lazy eval, but occasionally it presents problems for efficiency > and diagnosing efficiency problems. > > The rule seems to be: Write correct code first, fix the problems > (usually just inefficiencies) later. > > Using lazy bytestrings makes it easier to write concise code that is > more easily inspected for correctness. Perhaps it is even easier to > test such code, but I'm skeptical of that. Thus, I think most people > here would agree that reaching first for lazy byte string is preferred > over other techniques. Plus, the one of the most common fixes to > inefficient haskell programs is to make them lazy in the right places > and strict in key places and using lazy bytestring will get you part > of the way to that refactoring usually.
Work on the "dual" of lazy bytestrings -- chunked enumerators -- may lead to more options in this area. The question of compositionality of left-fold enumerators remains (afaik), but we'll see. -- Don _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
