On Fri, September 27, 2013 9:37 pm, Artem Bityutskiy wrote: > On Fri, 2013-09-27 at 21:15 +1000, Patrick Shirkey wrote: >> On Fri, September 27, 2013 8:24 pm, Artem Bityutskiy wrote: >> > On Fri, 2013-09-27 at 19:18 +1000, Patrick Shirkey wrote: >> >> On Fri, September 27, 2013 4:19 pm, Artem Bityutskiy wrote: >> >> > On Wed, 2013-09-25 at 02:49 +1000, Patrick Shirkey wrote: >> >> >> Hi, >> >> >> >> >> >> A quick update for those who are following this thread. >> >> >> >> >> >> We are tracing the audio latency when running a combination of >> JACK >> >> and >> >> >> PA. >> >> >> >> >> >> We are currently looking at the PA Stream Buffer as a potential >> >> >> bottleneck. >> >> >> >> >> >> During testing I have seen latency as low as 4ms round trip but >> also >> >> as >> >> >> high as 1300ms and the results are not stable on my hda_intel >> sound >> >> >> device. >> >> > >> >> > I think you earlier said you are using an x68 desktop for testing. >> >> What >> >> > I'd try to do is to prevent deep C-states. Indeed, a package you >> run >> >> > pulseaudio/jack/other related process is able to enter a deep >> C-state, >> >> > there is an exit latency associated with it. >> >> > >> >> > To put the long story short, there is the /dev/cpu_dma_latency >> file, >> >> > where you can write the latency you can tolerate (in ms). The >> kernel >> >> > will translate this to the deepest C-state the processor can enter. >> >> > >> >> > You can write 0 there, which will mean that CPU won't ever enter >> any >> >> > C-state and will busy-loop when idle. Bad for power consumption. >> But >> >> you >> >> > can just experiment if this helps to lessen the latency divination >> >> that >> >> > you observe. >> >> > >> >> > You can write a larger number, then CPU will enter C1 at least, >> which >> >> is >> >> > already a lot better for PM. >> >> > >> >> > You can verify which C-states you hit with the 'turbostat' tool or >> >> > powertop. The former comes, I think, from kernel-tools package in >> >> > Fedora. Play with latency number and use them to check which >> C-states >> >> > this corresponds to. >> >> > >> >> > Ah, and there is a trick. You should open /dev/cpu_dma_latency, >> write >> >> > your latency (as ascii or binary, both are ok), and _do not close >> it_. >> >> > As soon as you close it, the kernel will switch to the default >> latency >> >> > constraint. >> >> > >> >> > Also, advanced drivers usually use the kernel PMQoS infrastructure >> and >> >> > instruct the system when they cannot tolerate high latency. >> >> > >> >> > When I do 'git grep PM_QOS_CPU_DMA_LATENCY' in the kernel, I do not >> >> see >> >> > the HDA driver doing this. >> >> > >> >> > Anyway, this may not solve the issue, but I'd suggest to try out if >> it >> >> > at least partially helps. And I am very interested to hear if it >> does >> >> or >> >> > not, or may be you already tried this out. >> >> > >> >> >> >> >> >> I can't get turbostat with apt on debian as it has been removed from >> >> the >> >> acpica-tools package. >> > >> > Ok. You can easily compile it yourself if you want. It is in the >> kernel >> > tree in tools/power/x86/turbostat/, where you just type 'make'. >> > >> > Anyway, the only reason I refer to this tool is that you can use it to >> > check the C-state residency statistics, and how C-state residency is >> > affected by /dev/cpu_dma_latency settings. >> > >> >> Using powertop I see these stats with /dev/cpu_dma_latency set to 0: >> > >> > Did you open the file, wrote 0, and kept the file open? Does not look >> > like because I see you hit C3. >> > >> > I do not know how to do this from console, I wrote a custom scrip for >> > this. >> > >> > I have a python script which can do this, I can send it to you, let me >> > know in a private e-mail. >> > >> >> Idle >> >> Package | CPU 0 >> >> POLL 0.0% | POLL 0.0% 0.0 ms >> >> C1 0.3% | C1 0.4% 0.1 ms >> >> C2 17.8% | C2 17.2% 0.2 ms >> >> C3 13.1% | C3 12.0% 0.1 ms >> > >> > See, you are hitting C2 and C3. C3 has the highest exit latency. But I >> > do not know what would that be for your platform. >> > >> >> I see results similar to this with powertop while using your script : >> ./pmqos set cpu-latency 0 > > Hmm, OK, I do not have comments now. I use IvyBridge, and there this > works... You obviously have something older, a Westmere? Can you send me > your /proc/cpuinfo ? I do not promise to come with suggestions, but will > try to check few things. >
Thanks for taking the time to provide this feedback. This laptop is about 5 years old. It has similar capabilities to modern mobile hardware in that it is dual core powerful and has an onboard hda_intel audio chipset but it's really just a test system for this process to attempt to find all the potential bottlenecks. Everything so far helps in the decision making process if it turns out that PA needs fixing for the JACK + PA combination. The time spent on the fixes might not be worth it in comparison to other options that have been discussed. It could be useful to run these tests on the medfield chips too to get a wider dataset. $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz stepping : 6 microcode : 0x48 cpu MHz : 1830.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm tpr_shadow bogomips : 3657.30 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz stepping : 6 microcode : 0x48 cpu MHz : 1830.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm tpr_shadow bogomips : 3657.45 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: -- Patrick Shirkey Boost Hardware Ltd _______________________________________________ General mailing list [email protected] https://lists.tizen.org/listinfo/general
