Re: [Libunwind-devel] non-local return scalability bottleneck

D'Alessandro, Luke K Thu, 03 Dec 2015 09:29:39 -0800

> On Dec 3, 2015, at 10:34 AM, D'Alessandro, Luke K <[email protected]> 
> wrote:
> 
>> 
>> On Dec 3, 2015, at 7:30 AM, D'Alessandro, Luke K <[email protected]> 
>> wrote:
>> 
>> Hi All,
>> 
>> I have a C library that commonly uses a custom setjmp/longjmp for non-local 
>> return. I’m trying to add support for intermediate C++ code, which means I 
>> need to return through frames that might have RAII destructors that need to 
>> run. I’m attempting to use `_Unwind_ForcedUnwind()` to perform this 
>> operation. It works fine, however there is a serious scalability bottleneck 
>> that I’m trying to track down.
>> 
>> I’m using the 1.1 release and I’ve switched `x86_64_local_addr_space_init()` 
>> to set the default caching policy to UNW_CACHE_PER_THREAD. I did this 
>> statically because I couldn’t figure out where to insert 
>> `unw_set_caching_policy()` to get it to change properly—it appears that the 
>> address space is created inside of the call to `_Unwind_ForcedUnwind()`…?
>> 
>> In any case, I still see the app hammering away at a lock. I see an init 
>> lock in `tdep_init()`, but I doubt that’s an issue. I also see a lock in 
>> `trace_cache_get_unthreaded`, which I don’t think I should be hitting. If 
>> someone could point me to the likely issue that would be great, or if there 
>> is something fundamentally non-scalable about reading the dwarf information 
>> and unwinding that would be useful information too.
> 
> Okay, to answer my own question a bit.
> 
> Based on the `perf record -g` output below, there appears to be a lock inside 
> of `dl_iterate_phdr` that gets hit every time `fetch_proc_info` runs. There 
> is also locking in `dwarf_get` that happens occasionally. Is caching supposed 
> to elide the `dl` hits? Could this be user-error on my part?


Okay, I think I see why we keep hitting `dl_iterate_phdr()`. The 
`fetch_proc_info()` call uses it, and it’s getting called deterministically 
from 
http://git.savannah.gnu.org/gitweb/?p=libunwind.git;a=blob;f=src/dwarf/Gparser.c;h=3a47255c4a1afa217d1ecc99723babdc92cffec9;hb=HEAD#l924.

```
HIDDEN int
dwarf_make_proc_info (struct dwarf_cursor *c)
{
#if 0
    if (c->as->caching_policy == UNW_CACHE_NONE
        || get_cached_proc_info (c) < 0)
#endif
        /* Lookup it up the slow way... */
        return fetch_proc_info (c, c->ip, 0);
    return 0;
}
```

So the question becomes, what is standing in the way of a “ 
get_cached_proc_info” implementation? Is it just a "TODO/patches welcome” or is 
there something fundamentally difficult going on here?

Thanks,
Luke

> 
> Thanks,
> Luke
> 
> ```
> # Children      Self  Command    Shared Object       Symbol                   
>                      
> # ........  ........  .........  ..................  
> ..............................................
> #
>    84.80%     0.01%  fibonacci  [kernel.kallsyms]   [k] system_call_fastpath  
>                     
>            |
>            ---system_call_fastpath
>               |          
>               |--52.91%-- __lll_unlock_wake
>               |          |          
>               |          |--36.55%-- 0x100000000
>               |          |          
>               |          |--27.51%-- validate_mem
>               |          |          access_mem
>               |          |          |          
>               |          |          |--66.90%-- dwarf_get
>               |          |          |          apply_reg_state
>               |          |          |          _ULx86_64_dwarf_find_save_locs
>               |          |          |          0
>               |          |          |          
>               |          |           --33.10%-- dwarf_get
>               |          |                     access_mem
>               |          |                     dwarf_get
>               |          |                     apply_reg_state
>               |          |                     _ULx86_64_dwarf_find_save_locs
>               |          |                     0
>               |          |          
>               |          |--26.93%-- fetch_proc_info
>               |          |          _ULx86_64_dwarf_make_proc_info
>               |          |          _ULx86_64_get_proc_info
>               |          |          _Unwind_ForcedUnwind
>               |          |          
>               |           --9.01%-- dwarf_get
>               |                     access_mem
>               |                     dwarf_get
>               |                     apply_reg_state
>               |                     _ULx86_64_dwarf_find_save_locs
>               |                     0
>               |          
>               |--45.92%-- __lll_lock_wait
>               |          |          
>               |          |--99.99%-- fetch_proc_info
>               |          |          _ULx86_64_dwarf_make_proc_info
>               |          |          _ULx86_64_get_proc_info
>               |          |          _Unwind_ForcedUnwind
> ```
> 
>> 
>> Thanks,
>> Luke
>> _______________________________________________
>> Libunwind-devel mailing list
>> [email protected]
>> https://lists.nongnu.org/mailman/listinfo/libunwind-devel
> 
> _______________________________________________
> Libunwind-devel mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/libunwind-devel

_______________________________________________
Libunwind-devel mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/libunwind-devel

Re: [Libunwind-devel] non-local return scalability bottleneck

Reply via email to