Dear all,

I was out for some days, so was not able to follow the thread. Now I
have printed out all the posts I missed and it seems like you figured
out what the problem was and why there were no great performance
increase when turning to parallel runs.

As I understand it now the problem is that Trilinos solvers are not so
efficient as Pysparse, so when we're switching additional processors
and in the same time starting to use Trilinos, the overall performance
gained by additional proccessors is smeared out by ineffectiveness of
Trilinos solvers :-(. As I understand this is on the side of Trilinos
community and FiPy developers are not able to fix it (please, correct
me if I'm wrong), so we have to live with this for *some* time...

In case it is still important or for another point in statistics I
give my results for "anisotropy test":
                                                               100
                      500                              1000
pysparse                                                  1.7
                   18.3                             88.5
trilinos Default                                           4.0
                    78.4                            340.0
trilinos Default (no precon)                          2.8
               47.0                            202.2
trilinos PCG                                              3.9
trilinos PCG (no precon)                             2.4
              36.7                            147.3

trilinos PCG (no precon) -np 2                     2.4
            14.8                              87.5
trilinos PCG (no precon) -np 4                     5.2
            15.4                              46.5
trilinos PCG (no precon) -np 8                     9.9
           15.5                              35.5

Default: solver = DefaultSolver()
PCG:    solver = LinearPCGSolver()

How do you get _prepareLinearSystem time and _solve time separately?

Thanks everybody for figuring out the problem.  I think, however, that
we will return to the issue once more when we understand how the
solver affects one's results...

Kind regards,
Igor.

PS concerning scaling of the performance:

I see good gain in this problem up to the number of  2 real cores in
500x500 and 4 real cores in 1000x1000, no gain from hypethreading at
all in 500x500. Only in 1000x1000 there is small gain from
hyperthreaded cores. In small systems I didn't see the gain at all,
but just performance decreased.

PS2:
I didn't see this line (-D CMAKE_BUILD_TYPE:STRING=DEBUG \) in my
build. What is the default one or how can I check what type do I have?

Reply via email to