Thanks a lot for the suggestions. As I have mentioned, it was really a toy problem, but I am not getting a significant speedup on a bigger problem, where threads are nicely separated either and the problem is very CPU bound either. I would be very interested to know about tool that would point out to problems with cache and memory access.
Tomas