Beginners Digest, Vol 57, Issue 33

beginners-request Sat, 23 Mar 2013 18:08:28 -0700

Send Beginners mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        [email protected]


You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1.  stack overflow summing numbers read from a big   file (Axel Wegen)
   2. Re:  stack overflow summing numbers read from a big file
      (mukesh tiwari)
   3. Re:  stack overflow summing numbers read from a   big file
      (Axel Wegen)
   4. Re:  stack overflow summing numbers read from a big file
      (mukesh tiwari)
   5. Re:  stack overflow summing numbers read from a big file
      (Graham Gill)
   6.  binary parsing problem (Alexander Polakov)


----------------------------------------------------------------------

Message: 1
Date: Sat, 23 Mar 2013 12:42:41 +0000
From: Axel Wegen <[email protected]>
Subject: [Haskell-beginners] stack overflow summing numbers read from
        a big   file
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain

When I try to run the following code on a 50M file consisting of one
number per line I receive a `Stack space overflow' error and the program
seems to consume a lot of memory. Why does that happen? And how can I
fix the problem without increasing the Stack with -Ksize hoping that it
will be big enough? Any general advice on processing big files with
Haskell?

-- sumFile.hs
-- adapted from Real World Haskell's SumFile.hs at the beginning of
-- Chapter 8
main = interact sumFile
  where sumFile = show . sum . map read . words

ghc -o sumFile sumFile.hs  
./sumFile < ./numbers   



------------------------------

Message: 2
Date: Sat, 23 Mar 2013 17:41:10 +0530
From: mukesh tiwari <[email protected]>
Subject: Re: [Haskell-beginners] stack overflow summing numbers read
        from a big file
To: The Haskell-Beginners Mailing List - Discussion of primarily
        beginner-level topics related to Haskell <[email protected]>
Message-ID:
        <CAFHZvE9+yKSyq2At=0dczrsvn+nd_bc8ux7mdw68uycgr84...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Axel
If you see the type of your function
Prelude> :t (  show . sum . map read . words )
(  show . sum . map read . words ) :: String -> String
It takes a string and return string.

On Sat, Mar 23, 2013 at 6:12 PM, Axel Wegen <[email protected]> wrote:

> When I try to run the following code on a 50M file consisting of one
> number per line I receive a `Stack space overflow' error and the program
> seems to consume a lot of memory. Why does that happen?
>

It's already mentioned there "A String is represented as a list of
Charvalues; each element of a list is allocated individually, and has
some
book-keeping overhead. These factors affect the memory consumption and
performance of a program that must read or write text or binary data. On
simple benchmarks like this, even programs written in interpreted languages
such as Python can outperform Haskell code that uses String by an order of
magnitude".


> And how can I
> fix the problem without increasing the Stack with -Ksize hoping that it
> will be big enough? Any general advice on processing big files with
> Haskell?
>

Each ByteString type performs better under particular circumstances. For
streaming a large quantity (hundreds of megabytes to terabytes) of data,
the lazy ByteString type is usually best. Its chunk size is tuned to be
friendly to a modern CPU's L1 cache, and a garbage collector can quickly
discard chunks of streamed data that are no longer being used.


>
> -- sumFile.hs
> -- adapted from Real World Haskell's SumFile.hs at the beginning of
> -- Chapter 8
> main = interact sumFile
>   where sumFile = show . sum . map read . words
>
> ghc -o sumFile sumFile.hs
> ./sumFile < ./numbers
>

See if this code is working ( I haven't tested it on big file )

import qualified Data.ByteString.Lazy.Char8 as BS
import Data.Maybe ( fromJust )

readI :: BS.ByteString -> Integer
readI = fst . fromJust . BS.readInteger

main = BS.interact sumFile where
     sumFile =  BS.pack . show . sum . map readI . BS.words


-Mukesh

>
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://www.haskell.org/pipermail/beginners/attachments/20130323/f5bbd14e/attachment-0001.htm>

------------------------------

Message: 3
Date: Sat, 23 Mar 2013 15:29:38 +0000
From: Axel Wegen <[email protected]>
Subject: Re: [Haskell-beginners] stack overflow summing numbers read
        from a  big file
To: The Haskell-Beginners Mailing List - Discussion of primarily
        beginner-level topics related to Haskell <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain

mukesh tiwari <[email protected]> writes:
> It's already mentioned there "A String is represented as a list of
> Char values; each element of a list is allocated individually, and has
> some book-keeping overhead. These factors affect the memory
> consumption and performance of a program that must read or write text
> or binary data. On simple benchmarks like this, even programs written
> in interpreted languages such as Python can outperform Haskell code
> that uses String by an order of magnitude".

I assumed that just means using plain Strings for that job will take
more time.

> import qualified Data.ByteString.Lazy.Char8 as BS
> import Data.Maybe ( fromJust )
>
> readI :: BS.ByteString -> Integer
> readI = fst . fromJust . BS.readInteger 
>
> main = BS.interact sumFile where
> sumFile = BS.pack . show . sum . map readI . BS.words

I get the same result, the stack overflow. Though I don't have wait as
long for it to break.

I think that there is something about the summation that makes it
impossible for the compiler to do it's magic and optimize the thing to
something less stack overflowing. I just don't understand what that is.

--
Axel Wegen



------------------------------

Message: 4
Date: Sat, 23 Mar 2013 20:08:32 +0530
From: mukesh tiwari <[email protected]>
Subject: Re: [Haskell-beginners] stack overflow summing numbers read
        from a big file
To: The Haskell-Beginners Mailing List - Discussion of primarily
        beginner-level topics related to Haskell <[email protected]>
Message-ID:
        <CAFHZvE8kMaE5E4ijaQMbddC=_fgfyvmbqnkyhqhcx8ope4h...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Alex
I am no expert but try to profile your code and see the memory usage. It
seems like the  sum function is causing the stack overflow[1].
Try this one.
import qualified Data.ByteString.Lazy.Char8 as BS
import Data.Maybe ( fromJust )
import Data.List

readI :: BS.ByteString -> Integer
readI = fst . fromJust . BS.readInteger

main = BS.interact $ sumFile   where
     sumFile =  BS.pack . show . sum' . map readI . BS.words
     sum' = foldl' (+) 0

-Mukesh

[1] http://www.haskell.org/haskellwiki/Memory_leak


On Sat, Mar 23, 2013 at 8:59 PM, Axel Wegen <[email protected]> wrote:

> mukesh tiwari <[email protected]> writes:
> > It's already mentioned there "A String is represented as a list of
> > Char values; each element of a list is allocated individually, and has
> > some book-keeping overhead. These factors affect the memory
> > consumption and performance of a program that must read or write text
> > or binary data. On simple benchmarks like this, even programs written
> > in interpreted languages such as Python can outperform Haskell code
> > that uses String by an order of magnitude".
>
> I assumed that just means using plain Strings for that job will take
> more time.
>
> > import qualified Data.ByteString.Lazy.Char8 as BS
> > import Data.Maybe ( fromJust )
> >
> > readI :: BS.ByteString -> Integer
> > readI = fst . fromJust . BS.readInteger
> >
> > main = BS.interact sumFile where
> > sumFile = BS.pack . show . sum . map readI . BS.words
>
> I get the same result, the stack overflow. Though I don't have wait as
> long for it to break.
>
> I think that there is something about the summation that makes it
> impossible for the compiler to do it's magic and optimize the thing to
> something less stack overflowing. I just don't understand what that is.
>
> --
> Axel Wegen
>
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://www.haskell.org/pipermail/beginners/attachments/20130323/702d8eb9/attachment-0001.htm>

------------------------------

Message: 5
Date: Sat, 23 Mar 2013 11:41:14 -0400
From: Graham Gill <[email protected]>
Subject: Re: [Haskell-beginners] stack overflow summing numbers read
        from a big file
To: The Haskell-Beginners Mailing List - Discussion of primarily
        beginner-level topics related to Haskell <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

I think the problem is with laziness in the accumulator of "sum".

The prelude "sum" is defined as

sum     l       = sum' l 0
   where
     sum' []     a = a
     sum' (x:xs) a = sum' xs (a+x)

For example, if in GHCi I execute

> sum [1..50000000]
<interactive>: out of memory

I get an out of memory exception.

On the other hand, if I enable -XBangPatterns and define

mysum xs = sum' xs 0
   where sum' [] a = a
         sum' (x:xs) !a = sum' xs (x + a)


Then in GHCi

> mysum [1..50000000]
1250000025000000

which as you can easily check using n * (n + 1) / 2 with n = 50000000 is 
correct.

I don't know whether I have used the strictness annotation in the best 
way, but it works.

See http://www.haskell.org/haskellwiki/Performance/Strictness, which had 
some helpful hints.

Graham

On 23/03/2013 11:29 AM, Axel Wegen wrote:
> mukesh tiwari <[email protected]> writes:
>> It's already mentioned there "A String is represented as a list of
>> Char values; each element of a list is allocated individually, and has
>> some book-keeping overhead. These factors affect the memory
>> consumption and performance of a program that must read or write text
>> or binary data. On simple benchmarks like this, even programs written
>> in interpreted languages such as Python can outperform Haskell code
>> that uses String by an order of magnitude".
> I assumed that just means using plain Strings for that job will take
> more time.
>
>> import qualified Data.ByteString.Lazy.Char8 as BS
>> import Data.Maybe ( fromJust )
>>
>> readI :: BS.ByteString -> Integer
>> readI = fst . fromJust . BS.readInteger
>>
>> main = BS.interact sumFile where
>> sumFile = BS.pack . show . sum . map readI . BS.words
> I get the same result, the stack overflow. Though I don't have wait as
> long for it to break.
>
> I think that there is something about the summation that makes it
> impossible for the compiler to do it's magic and optimize the thing to
> something less stack overflowing. I just don't understand what that is.
>
> --
> Axel Wegen
>
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners




------------------------------

Message: 6
Date: Sun, 24 Mar 2013 05:07:32 +0400
From: Alexander Polakov <[email protected]>
Subject: [Haskell-beginners] binary parsing problem
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=us-ascii

I'm trying to parse a binary format. 
One of the components is defined like this:

"BITDOUBLE: 
 
 1st 2 bits : what it is
  
         00 : A double follows 
         01 : 1.0 
         10 : 0.0 
         11 : not used 

 Doubles are eight byte IEEE standard floating point values."

So I wrote this code which looks almost like verbatim translation
of the spec.

import Data.Binary
import Data.Binary.Get
import Control.Applicative
import qualified Data.Binary.Bits.Get as Bits

newtype DWG_BD = DWG_BD Double deriving (Show)

instance Binary DWG_BD where
    put = undefined
    get = do
        d <- Bits.runBitGet $ Bits.getWord8 2 >>= \i ->
             case i of
                    0 -> decode <$> Bits.getLazyByteString 8
                    1 -> return 1.0
                    2 -> return 0.0
                    _ -> fail "bad DWG_BD"
        return (DWG_BD d)

But then when I try to parse anything starting with 00 bits, I get this:

*Main> decodeFile "/tmp/foo" :: IO DWG_BD
DWG_BD *** Exception: Data.Binary.Get.runGet at position 2: demandInput:
not enough bytes

I'm 100% sure there're enough bytes in the file. I recall seeing this
problem when my record's fields were not strict, but this is not the
case here. I suppose the problem lies in getLazyByteString, but have 
no idea how to solve it.

-- 
Alexander Polakov | plhk.ru



------------------------------

_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners


End of Beginners Digest, Vol 57, Issue 33
*****************************************

Beginners Digest, Vol 57, Issue 33

Reply via email to