Hi meep community,

I've been trying to isolate a bottleneck in my simulations. Running on a quad 
core i7 processor I can't seem to get full CPU utilization. In fact I seem to 
get no better performance running 4 threads than running 2 (running 
simultaneous simulations, memory for each is isolated from the others). I'm 
wondering if I'm having a problem with cache misses causing a bottleneck due to 
memory access.

This got me looking through the code and I've got a question about the data 
layout. When we LOOP_OVER_VOL_OWNED we're going from the little_owned_corner0 
to big_corner of the grid volume. little_owned_corner0 = little_corner() + 
one_ivec(dim)*2 - iyee_shift(c), where dim is the dimension of the ivec and c 
is the component. Since iyee_shift has components 0 or 1, little_owned_corner0 
will never be the (0,0,0) data block in 3D. Right? 

I ask this because later when we're looping over ivecs it'll cause our stride 
to be one greater than our loop index. This would affect every time we go from 
one loop to the next outermost nested loop. I'm not sure how well the 
processors can detect this pattern (I don't know cpu infrastructure that well). 
For people working with large simulations I suspect this isn't a problem, but 
I've been running a large number at lower resolution and wonder if anyone could 
shed a little light on the topic.

The thing that got me looking at the stride issue was that I got a 10% speedup 
by rotating my coordinate system when I realized it looped through the z axis 
first, followed by y. My original system had the x dimension as the largest.

Thanks!
Tim

Tim Saucer
Physics 236/241/261 Lead GSI
Sih Research Group PhD Candidate
Office: 4409 Randall, Lab: 1277 Randall
Department of Physics, University of Michigan


_______________________________________________
meep-discuss mailing list
[email protected]
http://ab-initio.mit.edu/cgi-bin/mailman/listinfo/meep-discuss

Reply via email to