Greeting,

Jitter vs Throughput
====================

Scenario
--------

I have the following scenario:

CPU with [C] cores
concurrent program
the 1 main thread  uses OpenGL for animated visual output
[W] worker threads uses FFI to lengthy numerical computations

with   the following desires :

(J) minimize jitter    : the 1 main thread needs to be responsive
(T) maximize throughput: idle CPU time is a waste of time

The problem is, I want both (J) and (T)!


Benchmarks
----------

Some rough benchmarks from my 'production' implementation[1] where:

jitter = stddev of actual period              (target period = 40ms)
idle   = C - (real time) / (user+sys time)    (cores used elsewhere)
(but for example Xorg will use some time for the OpenGL output, etc)

    C W jitter idle
    2 0  1.5ms 168% threaded RTS
    2 1  1.9ms  65%     "     "
    2 2  9.4ms  46%     "     "
    2 3 13.3ms  37%     "     "

For comparison, a (very) minimal GLUT program gives:

         0.3ms      threaded RTS
         0.5ms  non-threaded RTS

Picking (J) over (T), then setting W=1 when C=2 gives best jitter
Picking (T) over (J), then setting W=2 when C=2 gives best throughput


Question
--------

What is the best way to re-structure the program to have both low jitter and high throughput?

Options I am considering:

1. worker threads estimate the time needed to complete each job,
   and don't start a job if it is likely to break the deadline
   (bonus points if just those worker Haskell threads running on
   the same OS thread as the main Haskell thread need to pause)

2. change the foreign code to break jobs into smaller pieces,
   for example perhaps something like:

     worker :: IO (IO Bool) -> IO ()
     worker getJob = forever $ do
       job <- getJob
       whileM_ job yield  -- [2]

   instead of

     worker :: IO (IO ()) -> IO ()
     worker = forever . join

3. re-implement the foreign code in Haskell instead of C and hope
   that GHC makes the Haskell go as fast as GCC makes the C go

5. wait (for a long time) for a new RTS with:
     full pre-emption (including interrupting/resuming foreign code)
     user settable thread priorities

(1) is "a fun challenge" (there may be PhDs awarded for less, I imagine)
(2) isn't quite trivial, some scope for tuning subjob chunk size
(3) is boring translation but could lead to interesting benchmarks
    even (especially?) if it fails to be as fast as C

Which would you pick?


Links
-----

[1] http://hackage.haskell.org/package/mandulia
[2] http://hackage.haskell.org/packages/archive/monad-loops/latest/doc/html/Control-Monad-Loops.html#v:whileM_


Thanks for any insight and advice,


Claude
--
http://claudiusmaximus.goto10.org
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to