Quoting Jaroslav Kysela <[EMAIL PROTECTED]>: > On Thu, 20 Feb 2003, Abramo Bagnara wrote: > > > Now I'm able to get the same results you see. > > > > However I think that we need to extract some results from this data. > > > > I'll leave alone MMX optimizations because I want to compare apples with > > apples. > > > > The distributed saturation (also when it's missing the check/repeat > > concurrency correctness part) costs more than 4 times the ticks needed > > for a (fully correct wrt concurrency) saturate once approach for the > > case 2048 8 32768. > > > > CPU clock: 1460477150.884593 > > mix_areas0: 86747 0.031975% > > mix_areas1: 259424 0.095623% (0) > > mix_areas1_mmx: 253894 0.093585% (0) > > mix_areas2: 132321 0.048773% (365) > > mix_areas3: 332411 0.122526% (0) > > > > The server based approach has an added cost of an extra context switch > > every period (about 1500 cycles on my machine i.e.), but this is fully > > amortized by such an huge difference. > > > > What's your opinion? > > Interesting is that my Intel P3 CPU has slightly different times: > > pnote:/home/perex/alsa/alsa-lib/test # ./code 2048 8 32768 > Scheduler set to Round Robin with priority 99... > CPU clock: 847.292487Mhz (UP) > > Summary (the best times): > mix_areas_srv : 576382 0.366206% > mix_areas0 : 556852 0.353798% > mix_areas1 : 867989 0.551480% > mix_areas1_mmx: 625144 0.397187% > mix_areas2 : 903335 0.573937% > > areas1/srv ratio : 1.505927 > areas1_mmx/srv ratio : 1.084600 > > I think that we can lose more in the client/server model. Also, note that > we can use even futexes (if there's a hope that the possible context > switch is acceptable) and then we can remove the cmpxchg trick and > write-retry trick and use MMX for parallel saturation of two samples (this > last can be used in the client/server model, too, indeed). > > Jaroslav >
I'm not sure what solution you're poroposing here exactly, but it seems to go in line with my trail of thought after seeing the results of these tests. It seems that a fast thread unsafe implementation could have such a huge speed advantage, that the waiting imposed on other processes because of global locking would still be compensated. To give an example, if we can have a 4 times quicker mixing procedure, instead of having 3 threads write concurrently for 12 seconds (that's 4 seconds cpu time per thread), they would write in turns - 1 second each giving a total of 3 seconds. So the 1st thread to gain access could return after 1 sec., the 2nd thread after 2 seconds and 3rd after 3. That's still better than one thread writing alone (for 4 seconds)! Yes, there is greater latency but it seems well compensated, at least for a reasonable number of sound sources connected. Anything above 4 doesn't make much sense anyway if our appropach is to saturate, rather than average - above this distortions will be very audiable. And if we devise a smart locking mechanism - this latency problem can be reduced to a minimum. The locking and unlocking code would be within the mixing function thus preventing a badly coded application from blocking indefinitely. A simple locking mechanism I'm considering is the following: - we maintain a short table of ranges locked by each client (one for each). - access to the table is synchronized with a single mutex - a request to lock a region could be partially realized, i.e. if thread 1 has locked offsets 300-500 and thread 2 wants 200-400 it will get access to 200-300, can mix there and then ask for the rest. Additionally, the mixing function could be implemented to break the buffer sent in into chunks of say, 1024 bytes and would try to lock and mix those segments in sequence. This would minimize the time spent waiting for other threads. It means a sound compromise (excuse the pun) between the convenience of not waiting for other threads by effectively synchronizing on a per pixel basis and the speed affored by code which doesn't need to care about synchronization, yet is not hindered by global blocking. Am I making myself clear or does this sound totally convoluted? -------------- Fycio (J.Sobierski) [EMAIL PROTECTED] ------------------------------------------------------- This SF.net email is sponsored by: SlickEdit Inc. Develop an edge. The most comprehensive and flexible code editor you can use. Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial. www.slickedit.com/sourceforge _______________________________________________ Alsa-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/alsa-devel