Hello everyone, I have an application on x86_64 that uses the perf_event_open syscall and mmap mechanism to obtain samples of its own threads. Samples are triggered by the HW_REF_CPU_CYCLES counter with frequency set to 10000. To perform a stack walk in the application code, the samples include the user-level SP, BP and IP registers as well as 16K of the user-level stack. This works very well for samples taken while execution is in user code.
When I also want to observe where threads are waiting by taking samples in the kernel (exclude_kernel = 0), it frequently seems like a chunk from the stack top is missing in the sample. Most of the time, the IP register indicates that execution is in pthread_cond_wait() or write(). But when I look at the sampled stack fragment in the debugger, I don't see any return addresses to code that uses these functions. Instead, it looks as if the stack fragment in the sample was taken at a significant offset to the actual top of stack. The stack pointer included in the sample, however, appears to always match the start of the sample's stack fragment. When I additionally sample the most recent calls and returns with the branch stack (Intel Haswell LBR), I get a realistic call chain that cannot be found in the sample's stack fragment. I experienced this issue with Linux 3.11.10 on openSUSE 13.1, and 3.14.0 on openSUSE Tumbleweed. As a possible workaround, I tried using the SW_CONTEXT_SWITCHES event, but the resulting samples have the same problem. Have you observed this problem before? Is there a possible workaround? Thanks, Peter Hofer -- To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html