Thanks, Trevor, for continuing to plug at this, and for sorting out the 
threading issue.

FYI, your efforts to find pygist are commendable, but ultimately futile. 
https://github.com/usnistgov/fipy/pull/453, which you accepted and merged, did 
away with the GistViewers, as they were deprecated years ago.

On Aug 25, 2015, at 1:06 PM, Keller, Trevor <[email protected]> wrote:

> After further tinkering, I have some additional comments on building 
> FiPy+Trilinos on Linux. Some are reflected in the revised build guide, 
> attached; some apply to running FiPy on the command line.
> 
> First, two corrections to the original build guide. While pytrilinos is 
> compiled when you build Trilinos, it is not installed with the rest. An 
> additional instruction is included to take care of that. Also, the original 
> guide left off gist, because it was difficult to find the right one: there 
> are many software packages by that name. The correct link is now included.
> 
> Next, while executing FiPy in parallel with Trilinos, numerous threads were 
> being launched. I therefore recompiled Trilinos with OpenMP support 
> explicitly enabled, and updated the build guide for the new flags. While the 
> original guide kept the command on one line for easy copying, the revision 
> splits it up for readability. Note that several detailed Trilinos build 
> guides for HPC are out there; this one 
> (https://redmine.scorec.rpi.edu/projects/albany-rpi/wiki/Installing_Trilinos) 
> exposes many flags by example, which I found quite useful. I should state 
> here that, while the Trilinos build command has grown, it may or may not have 
> improved: FiPy uses a subset of Trilinos' capabilities, so building the 
> entire tool is excessive. It will be worth optimizing which modules are built 
> after studying how FiPy interacts with Trilinos in greater detail.
> 
> Finally, I solved the baffling mystery of the speedup problem. The culprit 
> appears to be a bad interaction between Trilinos' greed for threads and 
> Python's Global interpreter Lock. By default on my test computer, every MPI 
> rank in the run spawns as many threads as there are cores available, which is 
> unexpected. It could be that Trilinos is designed to have one MPI rank per 
> node of a cluster, so each of the child threads would help with computation 
> but would not compete for I/O resources during ghost cell exchanges and file 
> I/O. However, Python's GIL binds all the child threads to the same core as 
> their parent! So instead of improving performance, each core suffers a heavy 
> overhead from the managing those idling threads. 
> 
> The fix I've used is to quash threads using the OpenMP environmental 
> variable. Before launching a FiPy script in parallel, export 
> OMP_NUM_THREADS=1. The effect is quite dramatic (results for a 6-core Intel 
> with 2x threading). 
> 
> $ echo "solver  gridsz  avg time"; echo -n "serial"; python mpitest.py; echo 
> -n "trilin"; python mpitest.py --trilinos; for n in {1..12}; do echo -n "np = 
> ${n}"; mpirun -np $n python mpitest.py --trilinos; done
> solver  gridsz  avg time
> serial  27900   0.130390
> trilin  27900   0.941082
> np = 1  27900   1.080579
> np = 2  14973   1.717842
> np = 3  10230   4.968959
> np = 4   7958   5.462615
> np = 5   6542   7.546303
> np = 6   5463   8.831044
> np = 7   4881   8.612932
> np = 8   4292   8.786715
> np = 9   4101  10.820657
> np = 10  3714  12.011446
> np = 11  3284  13.163548
> np = 12  3012  14.257576
> 
> $ export OMP_NUM_THREADS=1; echo "solver  gridsz  avg time"; echo -n 
> "serial"; python mpitest.py; echo -n "trilin"; python mpitest.py --trilinos; 
> for n in {1..12}; do echo -n "np = ${n}"; mpirun -np $n pythonmpitest.py 
> --trilinos; done
> solver  gridsz avg time
> serial  27900   0.139227
> trilin  27900   0.284342
> np = 1  27900   0.268329
> np = 2  14973   0.157163
> np = 3  10230   0.124317
> np = 4   7958   0.091335
> np = 5   6542   0.080261
> np = 6   5463   0.067957
> np = 7   4881   0.100654
> np = 8   4292   0.085887
> np = 9   4101   0.089979
> np = 10  3714   0.083326
> np = 11  3284   0.081005
> np = 12  3012   0.076399
> 
> Now, that is more like it: runtimes monotonically decrease toward n=6, jump, 
> then decrease (with some jitter) toward n=12. The revised Python script 
> showing average runtimes for easier analysis & plotting is available on 
> https://gist.github.com/tkphd/7f62afc064448ca80025.
> 
> Trevor Keller, Ph. D.
> National Institute of Standards and Technology
> 100 Bureau Dr., MS 8550; Gaithersburg, MD 20899
> Office: 223/A131 or (301) 975-2889
> 
> From: [email protected] <[email protected]> on behalf of Warren, 
> James A. <[email protected]>
> Sent: Thursday, July 30, 2015 10:59 AM
> To: FIPY
> Subject: Re: Notes on FiPy+Trilinos installation in Anaconda
>  
> Trevor, this very valuable.  Many thanks.
> 
> Dr. James A Warren
> Director, Materials Genome Program
> Material Measurement Laboratory
> National Institute of Standards and Technology
> (301)-975-5708
> On: 29 July 2015 12:52, "Keller, Trevor" <[email protected]> wrote:
> Hello fellow FiPyers,
> 
> There are several bits and pieces mentioning FiPy and Trilinos here, in the 
> user's manual, and around the Internet, but I haven't found a detailed guide 
> to install FiPy with Trilinos from source. I recently gave it a try on Debian 
> -- and it wasn't too bad. The whole process, which involved a couple false 
> starts, took me a day and a half. It is my hope in publishing this bare-bones 
> guide that fellow travelers can get it running on Linux in just a few hours. 
> If you'd like to follow this guide, the standard disclaimer applies: you are 
> downloading source code from the Internet and executing it on your machine. 
> Proceed with the utmost caution.
> 
> My machine runs Debian GNU/Linux 7.8 "wheezy" on a 4-core Intel processor. I 
> am not a sudoer, so a virtual environment was used: conda, from the Anaconda 
> Python distribution. The various source codes were extracted and compiled 
> within the ~/Downloads directory, then installed into the conda environment. 
> Many of the Python programs are available directly through Anaconda, and 
> should probably be installed through that package manager rather than from 
> source. I chose to compile everything to (a) see if I could and (b) ensure 
> compiler and library consistency. 
> 
> The attached build notes indicate the order in which packages were installed, 
> provide the home- or download-page URL for the software project, and specify 
> the commands I used to compile and install the code. Installation of Anaconda 
> is excluded, since I did not have to do it; commands to download, extract, 
> and cd into the source code are excluded since the latest versions change 
> almost daily.
> 
> After installation, I executed Daniel Wheeler's test script 
> (https://gist.github.com/wd15/8717979) to confirm everything worked. This 
> resulted in a couple of interesting things. First, FiPy as-is compares 
> versions as strings and believes gmsh 2.10.0 is less than 2.5.0. I revised my 
> local copy to compare versions as '.'-delimited tuples, which allowed the 
> code to execute in parallel; I'm working on a patch. Second, the runtimes 
> increased through np=4, then dropped at np=5 and beyond to values similar to 
> Dr. Wheeler's (http://wd15.github.io/2014/01/30/fipy-trilinos-anaconda/). The 
> machine has only four cores and is not hyperthreaded, so this was a surprise. 
> It could be that timing differences for heavier workloads will follow 
> expected scaling laws, but I am baffled by the result.
> 
> Results table (run type, MPI rank, grid size, time; modified script at 
> https://gist.github.com/anonymous/43afb6c567ebfd6d3b2f):
> 
> $ echo -n "serial"; python mpitest.py; echo -n "trilin"; python mpitest.py 
> --trilinos; for n in {1..6}; do echo -n "np = ${n}"; mpirun -np $n python 
> mpitest.py --trilinos; done
> serial  0       27900   0.134404
> trilin  0       27900   0.275607
> np = 1  0       27900   0.664498
> np = 2  0       14973   2.783230
>         1       14973   2.797370
> np = 3  0       10230   3.167559
>         1       10480   3.174004
>         2       10538   3.196746
> np = 4  0       7958    5.139595
>         1       7971    5.139695
>         2       8106    5.139710
>         3       7955    5.139925
> np = 5  0       6542    0.499731
>         1       6538    0.500179
>         2       6534    0.500313
>         3       6655    0.498517
>         4       6689    0.500652
> np = 6  0       5463    0.447847
>         1       5584    0.448344
>         2       5594    0.445663
>         3       5512    0.449028
>         4       5570    0.446170
>         5       5571    0.449486
> 
> Thoughts on this anomalous behavior would be greatly appreciated!
> 
> Good luck,
> Trevor
> 
> 
> 
> <fipy_conda_notes.txt>_______________________________________________
> fipy mailing list
> [email protected]
> http://www.ctcms.nist.gov/fipy
>  [ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ]


_______________________________________________
fipy mailing list
[email protected]
http://www.ctcms.nist.gov/fipy
  [ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ]

Reply via email to