Jaroslav Kysela wrote: > > On Thu, 20 Feb 2003, Abramo Bagnara wrote: > > > Now I'm able to get the same results you see. > > > > However I think that we need to extract some results from this data. > > > > I'll leave alone MMX optimizations because I want to compare apples with > > apples. > > > > The distributed saturation (also when it's missing the check/repeat > > concurrency correctness part) costs more than 4 times the ticks needed > > for a (fully correct wrt concurrency) saturate once approach for the > > case 2048 8 32768. > > > > CPU clock: 1460477150.884593 > > mix_areas0: 86747 0.031975% > > mix_areas1: 259424 0.095623% (0) > > mix_areas1_mmx: 253894 0.093585% (0) > > mix_areas2: 132321 0.048773% (365) > > mix_areas3: 332411 0.122526% (0) > > > > The server based approach has an added cost of an extra context switch > > every period (about 1500 cycles on my machine i.e.), but this is fully > > amortized by such an huge difference. > > > > What's your opinion? > > Interesting is that my Intel P3 CPU has slightly different times: > > pnote:/home/perex/alsa/alsa-lib/test # ./code 2048 8 32768 > Scheduler set to Round Robin with priority 99... > CPU clock: 847.292487Mhz (UP) > > Summary (the best times): > mix_areas_srv : 576382 0.366206% > mix_areas0 : 556852 0.353798% > mix_areas1 : 867989 0.551480% > mix_areas1_mmx: 625144 0.397187% > mix_areas2 : 903335 0.573937% > > areas1/srv ratio : 1.505927 > areas1_mmx/srv ratio : 1.084600
This is due to cache poisoning effect. This is quite surprising for me. With warm cache mix_areas_srv is 3 times faster than with cold cache, while there's a smaller difference with other alternatives. I've modified code.c to permit also to you to test such an effect. However I think that the realistic scenario is neither 0 nor 1024KB cache poison. > I think that we can lose more in the client/server model. Also, note that > we can use even futexes (if there's a hope that the possible context > switch is acceptable) and then we can remove the cmpxchg trick and > write-retry trick and use MMX for parallel saturation of two samples (this > last can be used in the client/server model, too, indeed). I really doubt that futex might be of some help, as it's very difficult to choose the unit it protects. Also I like very much the fact that concurring processes are totally independent. Using futex if one exit badly you're screwed. What seems more interesting for my eyes in dmix approach is (as Tomasz has pointed out) the exceptional good latency (which is the other side of the repeated saturation cost). However we will enjoy this benefit *only* if pcm_dmix is the last PCM of the chain. -- Abramo Bagnara mailto:[EMAIL PROTECTED] Opera Unica Phone: +39.546.656023 Via Emilia Interna, 140 48014 Castel Bolognese (RA) - Italy ------------------------------------------------------- This SF.net email is sponsored by: SlickEdit Inc. Develop an edge. The most comprehensive and flexible code editor you can use. Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial. www.slickedit.com/sourceforge _______________________________________________ Alsa-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/alsa-devel