Hi, > But we still have an issue with performance, when we are using default > unwinder, which uses unwind tables. It could be up to 10 times faster to > use frame based stack unwinder instead "default unwinder".
Switching on the frame pointer typically costs 1-2% performance, so it's a bad idea to use it. However changing the frame pointer like in the proposed patch will have a much larger cost - both in performance and codesize. You'd be lucky if it is less than 10%. This is due to placing the frame pointer at the top rather than the bottom of the frame, and that is very inefficient in Thumb-2. You would need to unwind ~100k times a second before you might see a performance gain. However you pay the performance cost all the time, even when no unwinding is required. So I don't see this as being a good idea. If unwind performance is an issue, it would make far more sense to solve that. Profiling typically hits the same functions again and again. Callgraph profiling to a fixed depth hits the same functions all the time. So caching is the obvious solution. Doing real unwinding is also far more accurate than frame pointer based unwinding (the latter doesn't handle leaf functions correctly, entry/exit in non-leaf functions and shrinkwrapped functions - and this breaks callgraph profiling). So my question is, is there any point in making code run significantly slower all the time and in return get inaccurate unwinding? Cheers, Wilco