On Tue, Nov 24, 2009 at 10:54 AM, Arun Sharma <[email protected]> wrote:
> Executing apply_reg_state with the lock held is a problem only for > UNW_CACHE_GLOBAL. With lock *not* held. Correct. > How does the performance of UNW_CACHE_PER_THREAD compare > in your tests? In google3 tests? I don't have a good way to measure that. I have tried to set UNW_CACHE_PER_THREAD as I was debugging this race, but that caused crashes I didn't understand; perhaps I should revisit that. Hmm, I don't see how it could work at all in current code :( Doesn't using UNW_CACHE_PER_THREAD require that unw_local_addr_space in x86*/Ginit.c be made a per-thread variable? Otherwise, all threads will share that global, but will not lock it. For Gperf-simple, there is no discernible difference (data below), but it only uses one thread. > I'm inclined to apply the more conservative fix #1 until we have more data > on the cost of the memcpy vs using UNW_CACHE_PER_THREAD. My concern with fix#1 is that it reduces concurrency in a hot function (apply_reg_state) -- we have CPUs to burn! Data from running tests/Gperf-simple: --- current --- unw_getcontext : cold avg= 150.204 nsec, warm avg= 38.147 nsec unw_init_local : cold avg= 259.876 nsec, warm avg= 50.068 nsec no cache : unw_step : 1st= 1848.312 min= 1341.956 avg= 1413.348 nsec global cache : unw_step : 1st= 390.552 min= 131.698 avg= 180.194 nsec per-thread cache: unw_step : 1st= 390.552 min= 131.698 avg= 171.771 nsec unw_getcontext : cold avg= 159.740 nsec, warm avg= 47.684 nsec unw_init_local : cold avg= 278.950 nsec, warm avg= 50.068 nsec no cache : unw_step : 1st= 1941.408 min= 1341.956 avg= 1424.901 nsec global cache : unw_step : 1st= 342.869 min= 131.698 avg= 167.376 nsec per-thread cache: unw_step : 1st= 304.268 min= 131.698 avg= 159.256 nsec unw_getcontext : cold avg= 147.820 nsec, warm avg= 40.531 nsec unw_init_local : cold avg= 259.876 nsec, warm avg= 50.068 nsec no cache : unw_step : 1st= 1855.124 min= 1360.121 avg= 1414.406 nsec global cache : unw_step : 1st= 429.153 min= 170.299 avg= 180.939 nsec per-thread cache: unw_step : 1st= 304.268 min= 131.698 avg= 173.349 nsec unw_getcontext : cold avg= 138.283 nsec, warm avg= 40.531 nsec unw_init_local : cold avg= 259.876 nsec, warm avg= 50.068 nsec no cache : unw_step : 1st= 1961.844 min= 1341.956 avg= 1367.185 nsec global cache : unw_step : 1st= 390.552 min= 131.698 avg= 140.364 nsec per-thread cache: unw_step : 1st= 267.937 min= 131.698 avg= 134.995 nsec --- fix #1 --- unw_getcontext : cold avg= 138.283 nsec, warm avg= 50.068 nsec unw_init_local : cold avg= 278.950 nsec, warm avg= 50.068 nsec no cache : unw_step : 1st= 1866.477 min= 1341.956 avg= 1409.588 nsec global cache : unw_step : 1st= 361.034 min= 131.698 avg= 152.169 nsec per-thread cache: unw_step : 1st= 417.800 min= 131.698 avg= 164.150 nsec unw_getcontext : cold avg= 150.204 nsec, warm avg= 38.147 nsec unw_init_local : cold avg= 288.486 nsec, warm avg= 50.068 nsec no cache : unw_step : 1st= 1827.876 min= 1341.956 avg= 1396.537 nsec global cache : unw_step : 1st= 342.869 min= 170.299 avg= 179.509 nsec per-thread cache: unw_step : 1st= 295.185 min= 170.299 avg= 175.034 nsec unw_getcontext : cold avg= 159.740 nsec, warm avg= 50.068 nsec unw_init_local : cold avg= 290.871 nsec, warm avg= 50.068 nsec no cache : unw_step : 1st= 1914.161 min= 1360.121 avg= 1415.378 nsec global cache : unw_step : 1st= 283.832 min= 140.780 avg= 145.469 nsec per-thread cache: unw_step : 1st= 286.102 min= 131.698 avg= 141.995 nsec unw_getcontext : cold avg= 150.204 nsec, warm avg= 47.684 nsec unw_init_local : cold avg= 271.797 nsec, warm avg= 47.684 nsec no cache : unw_step : 1st= 1839.229 min= 1341.956 avg= 1428.347 nsec global cache : unw_step : 1st= 295.185 min= 131.698 avg= 140.117 nsec per-thread cache: unw_step : 1st= 286.102 min= 131.698 avg= 164.806 nsec --- fix #2 --- unw_getcontext : cold avg= 159.740 nsec, warm avg= 38.147 nsec unw_init_local : cold avg= 278.950 nsec, warm avg= 50.068 nsec no cache : unw_step : 1st= 2132.143 min= 1341.956 avg= 1433.247 nsec global cache : unw_step : 1st= 381.470 min= 161.216 avg= 167.335 nsec per-thread cache: unw_step : 1st= 351.951 min= 161.216 avg= 163.721 nsec unw_getcontext : cold avg= 138.283 nsec, warm avg= 40.531 nsec unw_init_local : cold avg= 259.876 nsec, warm avg= 50.068 nsec no cache : unw_step : 1st= 1868.748 min= 1360.121 avg= 1410.736 nsec global cache : unw_step : 1st= 351.951 min= 161.216 avg= 167.386 nsec per-thread cache: unw_step : 1st= 304.268 min= 161.216 avg= 164.086 nsec unw_getcontext : cold avg= 140.667 nsec, warm avg= 50.068 nsec unw_init_local : cold avg= 269.413 nsec, warm avg= 50.068 nsec no cache : unw_step : 1st= 1836.958 min= 1341.956 avg= 1377.523 nsec global cache : unw_step : 1st= 390.552 min= 161.216 avg= 174.246 nsec per-thread cache: unw_step : 1st= 447.319 min= 161.216 avg= 185.533 nsec unw_getcontext : cold avg= 150.204 nsec, warm avg= 50.068 nsec unw_init_local : cold avg= 290.871 nsec, warm avg= 50.068 nsec no cache : unw_step : 1st= 1923.243 min= 1360.121 avg= 1430.753 nsec global cache : unw_step : 1st= 324.703 min= 161.216 avg= 167.917 nsec per-thread cache: unw_step : 1st= 283.832 min= 161.216 avg= 163.593 nsec -- Paul Pluzhnikov _______________________________________________ Libunwind-devel mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/libunwind-devel
