Hi,

First of all, don't be fooled by the alloc statistic. That is not 3GB memory residency, that's 3GB allocation, which was interspersed with lots of garbage collections, in the same way that measuring how many times malloc was called in a C program doesn't necessarily indicate memory residency. Using +RTS -s, it looks like your program uses around 10MB at any one time. As for the speed, your program is doing a lot of conversions that aren't necessary. CFloat has the Num and Floating instances necessary to use sin, so you're better off making everything a CFloat, rather than converting to and from Float. I took your program and ironed it out a bit (you were also using an extra readIORef as part of the modifyIORef that you didn't need), and used Criterion to test the speed. Here's the program in its entirety (you'll need to "cabal install criterion"):

===
import Data.IORef
import Foreign.C.Types
import Criterion.Main

newSinWave :: Int -> Float -> IO (CFloat -> IO CFloat)
newSinWave sampleRate freq =
 do ioref <- newIORef (0::Integer)
    let multiplier = 2 * pi * freq /
                     (fromIntegral sampleRate)

    return (\ _ -> {-# SCC "sinWave" #-}
             do t <- readIORef ioref
                modifyIORef ioref (+1)
                return $ fromRational $
                  toRational $
                  sin (fromIntegral t *
                       multiplier))

newSinWave' :: Int -> Float -> IO (CFloat -> IO CFloat)
newSinWave' sampleRate freq =
 do ioref <- newIORef 0
    let multiplier = 2 * pi * (realToFrac freq) /
                     (fromIntegral sampleRate)

    return (\ _ -> {-# SCC "sinWave'" #-}
             do t <- readIORef ioref
                writeIORef ioref (t+1)
                return $ sin (t * multiplier))

runLots :: (a -> IO a) -> a -> IO a
runLots f = go 10000
  where
    go 0 !x = return x
    go n !x = f x >>= go (n - 1)

main :: IO ()
main = do f <- newSinWave 44100 100
          g <- newSinWave' 44100 100

defaultMain [bench "old" $ runLots f 0, bench "new" $ runLots g 0]
===

And here's the output from Criterion on my machine, compiled with -XBangPatterns -O1 -rtsopts:

===
benchmarking old
collecting 100 samples, 1 iterations each, in estimated 10.54111 s
bootstrapping with 100000 resamples
mean: 116.4734 ms, lb 116.2565 ms, ub 117.1492 ms, ci 0.950
std dev: 1.794715 ms, lb 626.6683 us, ub 3.992824 ms, ci 0.950
found 5 outliers among 100 samples (5.0%)
  1 (1.0%) low severe
  3 (3.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 0.993%
variance is unaffected by outliers

benchmarking new
collecting 100 samples, 2 iterations each, in estimated 1.417208 s
bootstrapping with 100000 resamples
mean: 10.33277 ms, lb 10.15559 ms, ub 10.50883 ms, ci 0.950
std dev: 904.9297 us, lb 845.3293 us, ub 973.6881 us, ci 0.950
variance introduced by outliers: 1.000%
variance is unaffected by outliers
===

So unless I've done something wrong in the methodology (always possible), that's made it ten times faster. And here's the output from +RTS -s:

===
   6,458,290,512 bytes allocated in the heap
      10,855,744 bytes copied during GC
       5,522,696 bytes maximum residency (5 sample(s))
       3,194,696 bytes maximum slop
              13 MB total memory in use (0 MB lost due to fragmentation)
===

Hope that helps,

Neil.

On 29/01/2011 16:29, Edward Amsden wrote:
I recently got the jack package from hackage working again. For those
unfamiliar, jack is a callback-based audio server.
Writing a client entails importing the C library or its bindings (the
Sound.JACK module in my case), creating a client and
some ports (using provided library functions), and then registering
callbacks for audio processing.

I've written a simple program that outputs a sine wave through JACK.
The server's sample rate is 44100, which means that this function must
be called 44100 times/second (it is buffered, so generally that would
be in chunks of 64, 128, or 256). It is an IO function,
which gives the only opportunity to keep track of time:

(Note that the function produced by newSinWave is the one actually
registered as a callback:

newSinWave :: Int ->  Float ->  IO (CFloat ->  IO CFloat)
newSinWave sampleRate freq =
  do ioref<- newIORef (0::Integer)
     let multiplier = 2 * pi * freq /
                      (fromIntegral sampleRate)

     return (\_ ->  {-# SCC "sinWave" #-}
              do t<- readIORef ioref
                 modifyIORef ioref (+1)
                 return $ fromRational $
                   toRational $
                   sin (fromIntegral t *
                        multiplier))

I profiled this since when my program registered with the jack server
and started taking callbacks, it was using about 75% cpu
(in contrast, the echo program included with the jack package uses
less than 2%). The following two lines are of concern to me:

"total alloc = 3,040,397,164 bytes  (excludes profiling overheads)"
My program uses 3GB of virtual memory over a 15 second run?


"  sinWave  Main   341     1173295 100.0  100.0     0.0    0.0"
and ~100% of that 75% cpu time is being spent in my callback.

Is there something I'm doing wrong? At the very least, it shouldn't be
using 3GB of memory. The only thing that needs to be saved between
callbacks is the IORef, which is storing an Int. I assume that
evaluating that whole construct in haskell may be too much timewise to
put in
a sound callback (or perhaps not), but 3GB of memory is ridiculous.

Thoughts/hints/"you're doing it wrong" anyone?



_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to