Send Beginners mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1. Re:  Diagnosing : Large memory usage + low CPU (Hugo Ferreira)
   2.  Parallelism? (Michael Craig)
   3. Re:  Parallelism? (Edward Z. Yang)
   4. Re:  Parallelism? (Edward Z. Yang)
   5. Re:  Diagnosing : Large memory usage + low CPU (Edward Z. Yang)


----------------------------------------------------------------------

Message: 1
Date: Wed, 30 Nov 2011 14:23:53 +0000
From: Hugo Ferreira <[email protected]>
Subject: Re: [Haskell-beginners] Diagnosing : Large memory usage + low
        CPU
To: Stephen Tetley <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hello,

On 11/29/2011 10:57 PM, Stephen Tetley wrote:
> Hi Hugo
>
> What is a POSTags and how big do you expect it to be?
>
>

type Token = String
type Tag = String

type NGramTag = (Token, Tag, Tag)

type POSTags = Z.Zipper NGramTag


> Generally I'd recommend you first try to calculate the size of your
> data rather than try to strictify things, see Johan Tibell's very
> useful posts:
>
>
> http://blog.johantibell.com/2011/06/memory-footprints-of-some-common-data.html
> http://blog.johantibell.com/2011/06/computing-size-of-hashmap.html
>

According to size in String I am expecting a maximum of 50 Mega.
Profiling (after a painful 80 minutes) shows:

total alloc = 20,350,382,592 bytes

Way too much.

> Once you know the size of your data - you can decide if it is too big
> to comfortably work with in memory. If it is too big you need to make
> sure you're are streaming[*] it rather than forcing it into memory.
>
> If POSTags is large, I'd be very concerned about the top line of
> updateState - reversing lists (or sorting them) simply doesn't play
> well with streaming.
>

The zipper does quite a bit of reversing and appending.
I also need to reverse lists to retain the order of the
characters (text). I also do sorting but I have eliminated this
in the tests.

So my question: how can one "force" the reversing and append?
Anyone?

TIA,
Hugo F.


>
> [*] Even in a lazy language like Haskell, streaming data isn't
> necessarily automatic.
>
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners
>




------------------------------

Message: 2
Date: Thu, 1 Dec 2011 00:50:15 -0500
From: Michael Craig <[email protected]>
Subject: [Haskell-beginners] Parallelism?
To: [email protected]
Message-ID:
        <caha9zagoq4rxoskveds53cowrzyreywm40d1xwa_fn8mq8v...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

I was writing some parallel code (asynchronous database writes for an event
logger, but that's besides the point), and it seemed like the parallelized
version (i.e. compiled with -threaded -with-rtsopts=-N2) wasn't running
fast enough. I boiled it down to a dead-simple test:

    import Control.Concurrent
    import Data.Time.Clock.POSIX
    import System.Environment

    main :: IO ()
    main = do
        n <- getArgs >>= return . read . head
        t1 <- getPOSIXTime
        work n
        t2 <- getPOSIXTime
        putStrLn $ show $ t2 - t1
        putStrLn $ show $ (fromIntegral n :: Double)
                        / (fromRational . toRational $ t2 - t1)

    work :: Integer -> IO ()
    work n = do
      forkIO $ putStrLn $ seq (fact n) "Done"
      putStrLn $ seq (fact n) "Done"

    fact :: Integer -> Integer
    fact 1 = 1
    fact n = n * fact (n - 1)

(I know this is not the best way to time things but I think it suffices for
this test.)

Compiled with ghc --make -O3 test.hs, ./test 500000 runs for 74 seconds.
Compiling with ghc --make -O3 -threaded -with-rtsopts=-N, ./test 500000
runs for 82 seconds (and seems to be using 2 cpu cores instead of just 1,
on a 4-core machine). What gives?

Mike S Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://www.haskell.org/pipermail/beginners/attachments/20111201/d3589684/attachment-0001.htm>

------------------------------

Message: 3
Date: Thu, 01 Dec 2011 01:27:48 -0500
From: "Edward Z. Yang" <[email protected]>
Subject: Re: [Haskell-beginners] Parallelism?
To: Michael Craig <[email protected]>
Cc: beginners <[email protected]>
Message-ID: <1322720161-sup-4071@ezyang>
Content-Type: text/plain; charset=UTF-8

Sounds like you're paying a x2 cost for loading up the threaded runtime
(compare -N1 -threaded with no flags.)  Not really sure why, it looks like
you're getting killed by GC.

Are you sure you want to be doing factorial on Integers?

Edward

Excerpts from Michael Craig's message of Thu Dec 01 00:50:15 -0500 2011:
> I was writing some parallel code (asynchronous database writes for an event
> logger, but that's besides the point), and it seemed like the parallelized
> version (i.e. compiled with -threaded -with-rtsopts=-N2) wasn't running
> fast enough. I boiled it down to a dead-simple test:
> 
>     import Control.Concurrent
>     import Data.Time.Clock.POSIX
>     import System.Environment
> 
>     main :: IO ()
>     main = do
>         n <- getArgs >>= return . read . head
>         t1 <- getPOSIXTime
>         work n
>         t2 <- getPOSIXTime
>         putStrLn $ show $ t2 - t1
>         putStrLn $ show $ (fromIntegral n :: Double)
>                         / (fromRational . toRational $ t2 - t1)
> 
>     work :: Integer -> IO ()
>     work n = do
>       forkIO $ putStrLn $ seq (fact n) "Done"
>       putStrLn $ seq (fact n) "Done"
> 
>     fact :: Integer -> Integer
>     fact 1 = 1
>     fact n = n * fact (n - 1)
> 
> (I know this is not the best way to time things but I think it suffices for
> this test.)
> 
> Compiled with ghc --make -O3 test.hs, ./test 500000 runs for 74 seconds.
> Compiling with ghc --make -O3 -threaded -with-rtsopts=-N, ./test 500000
> runs for 82 seconds (and seems to be using 2 cpu cores instead of just 1,
> on a 4-core machine). What gives?
> 
> Mike S Craig



------------------------------

Message: 4
Date: Thu, 01 Dec 2011 02:38:07 -0500
From: "Edward Z. Yang" <[email protected]>
Subject: Re: [Haskell-beginners] Parallelism?
To: Michael Craig <[email protected]>
Cc: beginners <[email protected]>
Message-ID: <1322724894-sup-1108@ezyang>
Content-Type: text/plain; charset=UTF-8

With regards to your original concurrent code (asynchronous database writes),
if the API given to you truly is asynchronous (i.e. it's a file descriptor that
could be monitored with epoll/kqueue/folks), consider integrating it with
the IO manager, so that you don't need to tie up real OS threads on blocking
FFI calls (though I'm not sure what database or what access mechanism is.)  You 
really
shouldn't need the threaded runtime for a task like this.

Maybe if you give more details we can give more specific advice.

Edward

Excerpts from Michael Craig's message of Thu Dec 01 00:50:15 -0500 2011:
> I was writing some parallel code (asynchronous database writes for an event
> logger, but that's besides the point), and it seemed like the parallelized
> version (i.e. compiled with -threaded -with-rtsopts=-N2) wasn't running
> fast enough. I boiled it down to a dead-simple test:
> 
>     import Control.Concurrent
>     import Data.Time.Clock.POSIX
>     import System.Environment
> 
>     main :: IO ()
>     main = do
>         n <- getArgs >>= return . read . head
>         t1 <- getPOSIXTime
>         work n
>         t2 <- getPOSIXTime
>         putStrLn $ show $ t2 - t1
>         putStrLn $ show $ (fromIntegral n :: Double)
>                         / (fromRational . toRational $ t2 - t1)
> 
>     work :: Integer -> IO ()
>     work n = do
>       forkIO $ putStrLn $ seq (fact n) "Done"
>       putStrLn $ seq (fact n) "Done"
> 
>     fact :: Integer -> Integer
>     fact 1 = 1
>     fact n = n * fact (n - 1)
> 
> (I know this is not the best way to time things but I think it suffices for
> this test.)
> 
> Compiled with ghc --make -O3 test.hs, ./test 500000 runs for 74 seconds.
> Compiling with ghc --make -O3 -threaded -with-rtsopts=-N, ./test 500000
> runs for 82 seconds (and seems to be using 2 cpu cores instead of just 1,
> on a 4-core machine). What gives?
> 
> Mike S Craig



------------------------------

Message: 5
Date: Thu, 01 Dec 2011 02:55:48 -0500
From: "Edward Z. Yang" <[email protected]>
Subject: Re: [Haskell-beginners] Diagnosing : Large memory usage + low
        CPU
To: Hugo Ferreira <[email protected]>
Cc: beginners <[email protected]>
Message-ID: <1322726079-sup-5834@ezyang>
Content-Type: text/plain; charset=UTF-8

Hello Hugo,

Can you do a heap profile (+RTS -hT, or maybe use one of the other
options if you've got a profiling copy lying around)?  Try using
smaller data if it's taking too long; usually the profile will still
look the same, unless it's a particular type of input that is triggering
bad behavior.

There is not enough detail in your code for me to use my psychic
debugging skills, unfortunately.

Edward

Excerpts from Hugo Ferreira's message of Wed Nov 30 09:23:53 -0500 2011:
> Hello,
> 
> On 11/29/2011 10:57 PM, Stephen Tetley wrote:
> > Hi Hugo
> >
> > What is a POSTags and how big do you expect it to be?
> >
> >
> 
> type Token = String
> type Tag = String
> 
> type NGramTag = (Token, Tag, Tag)
> 
> type POSTags = Z.Zipper NGramTag
> 
> > Generally I'd recommend you first try to calculate the size of your
> > data rather than try to strictify things, see Johan Tibell's very
> > useful posts:
> >
> >
> > http://blog.johantibell.com/2011/06/memory-footprints-of-some-common-data.html
> > http://blog.johantibell.com/2011/06/computing-size-of-hashmap.html
> >
> 
> According to size in String I am expecting a maximum of 50 Mega.
> Profiling (after a painful 80 minutes) shows:
> 
> total alloc = 20,350,382,592 bytes
> 
> Way too much.
> 
> > Once you know the size of your data - you can decide if it is too big
> > to comfortably work with in memory. If it is too big you need to make
> > sure you're are streaming[*] it rather than forcing it into memory.
> >
> > If POSTags is large, I'd be very concerned about the top line of
> > updateState - reversing lists (or sorting them) simply doesn't play
> > well with streaming.
> >
> 
> The zipper does quite a bit of reversing and appending.
> I also need to reverse lists to retain the order of the
> characters (text). I also do sorting but I have eliminated this
> in the tests.
> 
> So my question: how can one "force" the reversing and append?
> Anyone?
> 
> TIA,
> Hugo F.
> 
> >
> > [*] Even in a lazy language like Haskell, streaming data isn't
> > necessarily automatic.
> >
> > _______________________________________________
> > Beginners mailing list
> > [email protected]
> > http://www.haskell.org/mailman/listinfo/beginners
> >
> 



------------------------------

_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners


End of Beginners Digest, Vol 42, Issue 1
****************************************

Reply via email to