Brian Cameron wrote:
> Bart:
> 
>> Solaris has a mixer in the kernel; it combines each audio stream
>> together and does the volume adjustment for each stream when
>> mixed (a very small fraction of a second before play). If the
>> volume slider on rhythmbox adjusts the level prior to writing
>> the data to the ringbuffer, we're going to get significant
>> "lag" (4 seconds on my machine).
>>
>> Adequate buffering is important in an audio application,
>> although dealing with real-time action games can be tricky
>> (getting the sound right for Quake on Solaris took a little
>> while).
>>
>> In this case, can GStreamer use a hardware volume control rather
>> than a software one?
> 
> If we wanted to invest the time to hack all GNOME audio programs
> to have a system volume control instead of an application one,
> then we could do this.  However, wouldn't it be better just to
> fix SunAudio to not require such an enormous buffer?
> 
> Brian

I guess I'm just surprised the GStreamer framework doesn't
support hardware (ok, kernel software) mixers/gain controls, though.

My guess is that what's happening is that the timeshare scheduler is
dropping the priority of the gst_task_func thread that's doing all the
work a lot; anything else that needs to run will preempt it.

Here's a typical stack trace of the thread getting all the CPU time:

pstack 1985/160
1985:   rhythmbox
-----------------  lwp# 160 / thread# 160  --------------------
  bff57e6b lwp_park (0, 0, 0)
  bff52196 cond_wait_queue (882bc40, 879bc28, 0, 0) + 3e
  bff52697 _cond_wait (882bc40, 879bc28) + 69
  bff526d7 cond_wait (882bc40, 879bc28) + 24
  bff52713 pthread_cond_wait (882bc40, 879bc28) + 1e
  bb8e3406 gst_queue_chain (87dc6c0, 86ed408) + 6ba
  bd593f78 gst_pad_chain (87dc6c0, 86ed408) + 290
  bd5945db gst_pad_push (86a1568, 86ed408) + 157
  bb92c737 gst_stream_selector_chain (87edb40, 86ed408) + 1ef
  bd593f78 gst_pad_chain (87edb40, 86ed408) + 290
  bd5844b3 gst_proxy_pad_do_chain (8359190, 86ed408) + 2f
  bd593f78 gst_pad_chain (8359190, 86ed408) + 290
  bd5945db gst_pad_push (86a10e8, 86ed408) + 157
  bb6a2b6d flump3dec_drain_avail (86a11a8, 1) + 5cd
  bb6a1e2e flump3dec_sink_chain (86a11a8, 867a408) + c2
  bd593f78 gst_pad_chain (86a11a8, 867a408) + 290
  bd5945db gst_pad_push (83c1e30, 867a408) + 157
  bb753b82 gst_id3demux_chain (86e76f8, 867a408) + 45a
  bd593f78 gst_pad_chain (86e76f8, 867a408) + 290
  bd5945db gst_pad_push (87dc0c0, 867a408) + 157
  bb8e7de5 gst_type_find_element_chain (87dc240, 867a408) + 55
  bd593f78 gst_pad_chain (87dc240, 867a408) + 290
  bd5844b3 gst_proxy_pad_do_chain (889aaa0, 867a408) + 2f
  bd593f78 gst_pad_chain (889aaa0, 867a408) + 290
  bd5945db gst_pad_push (87dc600, 867a408) + 157
  bb8ac905 gst_base_src_loop (87dc600) + 255
  bd5a9024 gst_task_func (8654b40, 85b2768) + 1f0
  beaa5560 g_thread_pool_thread_proxy (83433a8) + c0
  beaa4267 g_thread_create_proxy (87ed2d0) + 11b
  bff57bdd _thr_setup (bc5a1000) + 51
  bff57e30 _lwp_start (bc5a1000, 0, 0, 0, 0, 0)

This thread uses 4.3% of my 2Ghz Opteron.


Since ts_tick processing in the kernel is slaved to the
100 Hz clock, you might try cheating and have the above
thread do a poll(0,0, 1) after writing its payload to the
audio device.  This will cause it to be asleep when
tick processing is done, and should help keep it's priority
up, which will allow you to reduce the need for buffering.

You can see stuff like this w/ prstat -L, which provides
separate lines of output per thread (LWP).  Truss also
groks the -p pid/lwpid syntax, so when you see the following
in prstat -L:

    PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/LWPID
   1985 barts      70M   27M cpu1    35    0   0:00:05 3.9% rhythmbox/193
   1831 barts     159M   94M sleep   59    0   0:14:52 1.9% firefox-bin/1
    982 barts     124M   97M sleep   59    0   0:14:20 0.8% Xorg/1
   1196 barts      84M   34M sleep   59    0   0:00:19 0.5% gnome-panel/1

You can do

truss -p 1985/193 and see what just that thread is doing.

Some other perf comments:

The gstreamer code does a lot of debugging via gst_debug_log;
is that needed?

The implementation of g_atomic_* is very poor.

 From a few second run w/ plockstat -A -p <pid>:

Mutex block

Count     nsec Lock                         Caller
-------------------------------------------------------------------------------
  3081    58528 0x810b898 
libglib-2.0.so.0.1000.1`g_atomic_int_get+0x3e
   191    67791 0x810b898 
libglib-2.0.so.0.1000.1`g_atomic_int_exchange_and_add+0x33
   116    90832 0x810b898 
libglib-2.0.so.0.1000.1`g_atomic_int_compare_and_exchange+0x3f
  1761     3888 0x810b898 
libglib-2.0.so.0.1000.1`g_atomic_int_get+0x3e
    75    62521 0x810b898 
libglib-2.0.so.0.1000.1`g_atomic_pointer_get+0x3e
    63    64554 0x810b898 
libglib-2.0.so.0.1000.1`g_atomic_int_add+0x30
   152     2933 0x810b898 
libglib-2.0.so.0.1000.1`g_atomic_int_exchange_and_add+0x33
    96     3560 0x810b898 
libglib-2.0.so.0.1000.1`g_atomic_int_compare_and_exchange+0x3f
    90     3609 0x810b898 
libglib-2.0.so.0.1000.1`g_atomic_pointer_get+0x3e
     9    31446 0x85bb150 
libgstcoreelements.so`gst_queue_loop+0xae
     6    24564 0x85bb150 
libgstcoreelements.so`gst_queue_chain+0xaa
     6    16431 
libgstreamer-0.10.so.0.4.0`.XAByp4pT9jOEWpS.__dbg_functions_mutex+0x4 
libgstreamer-0.10.so.0.4.0`_gst_debug_nameof_funcptr+0x83
    26     3085 0x810b898 
libglib-2.0.so.0.1000.1`g_atomic_int_add+0x30
     3    23952 0x85bb150 
libgstcoreelements.so`gst_queue_chain+0x9b4
     1    20851 0x85bb150                    libc.so.1`cond_wait+0x24
     1     3043 
libgstreamer-0.10.so.0.4.0`.XAByp4pT9jOEWpS.__dbg_functions_mutex+0x4 
libgstreamer-0.10.so.0.4.0`_gst_debug_nameof_funcptr+0x83
#

g_atomic_int_get is really evil.


For a Solaris architecture neutral fix, G_ATOMIC_MEMORY_BARRIER
should be set to membar_consumer() as far as I can tell from the
gnome sources & comments.  In any case, all threads don't need
to share a single mutex; a portable implementation might as well be:

gpointer
g_atomic_pointer_get (gpointer *atomic)
{
   gpointer result;
   GMutex my_atomic_mutex;

   bzero(&my_atomic_mutex, sizeof (GMutex));

   g_mutex_lock (&my_atomic_mutex);
   result = *atomic;
   g_mutex_unlock (&my_atomic_mutex);

   return result;
}


- Bart

-- 
Bart Smaalders                  Solaris Kernel Performance
barts at cyber.eng.sun.com              http://blogs.sun.com/barts

Reply via email to