Hi all and Eddie.

> I've tried running some jobs on a MacPro with Dual quad core processors
> which are hyperthreaded (the OS recognizes 16 cores, also, I'm still using
> Trilinos 9.0.3 and FiPy 2.02).  I typically see pretty linear scaling when
> running jobs up to the number of physical cores, but I see no improvement,
> or even an increase in total job time if I go above that limit (e.g. 8 cores
> is ~half the time of 4, but 16 is no better or perhaps worse than 8) for a
> 256x256 grid.

Does "pretty linear" mean ~8 times faster when you run with "mpirun
-np 8", i.e. efficiency of each real core is close to 100% ?

> For larger systems (1024x1024) I've seen pretty good performance on our
> cluster at 64 or even 128 cores (with 8 cores per node and no hyperthreading
> there).  Here, the performance improvement seems to drop off more slowly
> with an increase in cores compared to the transition from linear to no
> improvement when attempting to use the hyperthreaded cores on our desktop
> machines.

I haven't tried yet to run on cluster, but what you are saying means
that communication through other than motherboard means does not
really affect your problem, i.e. your problem  doesn't have too much
of messaging.
I'm starting to doubt now if I have 8 real cores or 4 hyperthreaded. I
just took the number of cores from the top command. I'll find out
about it, but 8 gives overall performance increase compared to 4 in my
case.

> Based on your experience, it sounds like the aspect ratio of the system can
> be pretty important which is interesting as my problem will probably require
> non-square domains.  I can let you know when I get around to testing this
> out, and thanks for sharing!

Actually, in my case the cell were becoming *more square* when I was
increasing the cell number in one dimension. So I have non-square
cell. After this post I decided to make "square cells" with applying
some re-scalings, which are actually very inconvenient in my specific
problem, and repeated some tests. It didn't show any performance
efficiency increase, instead I couldn't get more than 50-60%
efficiency per core as I had to keep cells square so had to increase
dimension along which the communication is done as well. So, the cell
geometry doesn't play a huge role.

But to be sure I'd like to ask developers here if FiPy is working
differently and if solution is strongly affected when one side of the
cell is lets say 10 times the other?

In all, I now think that the performance efficiency strongly depends
on the problem type. I have quite complicated problem with several
convection terms, sources and also diffusion term, so communicating
information between the nodes will always take some noticeable time
compared to calculations.

Eddie, did you make your benchmark with some problem you are solving
or some "example problem" from the Manual? If it is the one from or
similar to the Manual, could you share the script or say which
"example problem" you used? We would make some similar benchmarks to
finally understand what governs the performance efficiency. If we
don't find same results that would mean some problems on our sides.

Thanks to all!

Cheers,
Igor.


Reply via email to