Thanks, Trevor, for continuing to plug at this, and for sorting out the threading issue.
FYI, your efforts to find pygist are commendable, but ultimately futile. https://github.com/usnistgov/fipy/pull/453, which you accepted and merged, did away with the GistViewers, as they were deprecated years ago. On Aug 25, 2015, at 1:06 PM, Keller, Trevor <[email protected]> wrote: > After further tinkering, I have some additional comments on building > FiPy+Trilinos on Linux. Some are reflected in the revised build guide, > attached; some apply to running FiPy on the command line. > > First, two corrections to the original build guide. While pytrilinos is > compiled when you build Trilinos, it is not installed with the rest. An > additional instruction is included to take care of that. Also, the original > guide left off gist, because it was difficult to find the right one: there > are many software packages by that name. The correct link is now included. > > Next, while executing FiPy in parallel with Trilinos, numerous threads were > being launched. I therefore recompiled Trilinos with OpenMP support > explicitly enabled, and updated the build guide for the new flags. While the > original guide kept the command on one line for easy copying, the revision > splits it up for readability. Note that several detailed Trilinos build > guides for HPC are out there; this one > (https://redmine.scorec.rpi.edu/projects/albany-rpi/wiki/Installing_Trilinos) > exposes many flags by example, which I found quite useful. I should state > here that, while the Trilinos build command has grown, it may or may not have > improved: FiPy uses a subset of Trilinos' capabilities, so building the > entire tool is excessive. It will be worth optimizing which modules are built > after studying how FiPy interacts with Trilinos in greater detail. > > Finally, I solved the baffling mystery of the speedup problem. The culprit > appears to be a bad interaction between Trilinos' greed for threads and > Python's Global interpreter Lock. By default on my test computer, every MPI > rank in the run spawns as many threads as there are cores available, which is > unexpected. It could be that Trilinos is designed to have one MPI rank per > node of a cluster, so each of the child threads would help with computation > but would not compete for I/O resources during ghost cell exchanges and file > I/O. However, Python's GIL binds all the child threads to the same core as > their parent! So instead of improving performance, each core suffers a heavy > overhead from the managing those idling threads. > > The fix I've used is to quash threads using the OpenMP environmental > variable. Before launching a FiPy script in parallel, export > OMP_NUM_THREADS=1. The effect is quite dramatic (results for a 6-core Intel > with 2x threading). > > $ echo "solver gridsz avg time"; echo -n "serial"; python mpitest.py; echo > -n "trilin"; python mpitest.py --trilinos; for n in {1..12}; do echo -n "np = > ${n}"; mpirun -np $n python mpitest.py --trilinos; done > solver gridsz avg time > serial 27900 0.130390 > trilin 27900 0.941082 > np = 1 27900 1.080579 > np = 2 14973 1.717842 > np = 3 10230 4.968959 > np = 4 7958 5.462615 > np = 5 6542 7.546303 > np = 6 5463 8.831044 > np = 7 4881 8.612932 > np = 8 4292 8.786715 > np = 9 4101 10.820657 > np = 10 3714 12.011446 > np = 11 3284 13.163548 > np = 12 3012 14.257576 > > $ export OMP_NUM_THREADS=1; echo "solver gridsz avg time"; echo -n > "serial"; python mpitest.py; echo -n "trilin"; python mpitest.py --trilinos; > for n in {1..12}; do echo -n "np = ${n}"; mpirun -np $n pythonmpitest.py > --trilinos; done > solver gridsz avg time > serial 27900 0.139227 > trilin 27900 0.284342 > np = 1 27900 0.268329 > np = 2 14973 0.157163 > np = 3 10230 0.124317 > np = 4 7958 0.091335 > np = 5 6542 0.080261 > np = 6 5463 0.067957 > np = 7 4881 0.100654 > np = 8 4292 0.085887 > np = 9 4101 0.089979 > np = 10 3714 0.083326 > np = 11 3284 0.081005 > np = 12 3012 0.076399 > > Now, that is more like it: runtimes monotonically decrease toward n=6, jump, > then decrease (with some jitter) toward n=12. The revised Python script > showing average runtimes for easier analysis & plotting is available on > https://gist.github.com/tkphd/7f62afc064448ca80025. > > Trevor Keller, Ph. D. > National Institute of Standards and Technology > 100 Bureau Dr., MS 8550; Gaithersburg, MD 20899 > Office: 223/A131 or (301) 975-2889 > > From: [email protected] <[email protected]> on behalf of Warren, > James A. <[email protected]> > Sent: Thursday, July 30, 2015 10:59 AM > To: FIPY > Subject: Re: Notes on FiPy+Trilinos installation in Anaconda > > Trevor, this very valuable. Many thanks. > > Dr. James A Warren > Director, Materials Genome Program > Material Measurement Laboratory > National Institute of Standards and Technology > (301)-975-5708 > On: 29 July 2015 12:52, "Keller, Trevor" <[email protected]> wrote: > Hello fellow FiPyers, > > There are several bits and pieces mentioning FiPy and Trilinos here, in the > user's manual, and around the Internet, but I haven't found a detailed guide > to install FiPy with Trilinos from source. I recently gave it a try on Debian > -- and it wasn't too bad. The whole process, which involved a couple false > starts, took me a day and a half. It is my hope in publishing this bare-bones > guide that fellow travelers can get it running on Linux in just a few hours. > If you'd like to follow this guide, the standard disclaimer applies: you are > downloading source code from the Internet and executing it on your machine. > Proceed with the utmost caution. > > My machine runs Debian GNU/Linux 7.8 "wheezy" on a 4-core Intel processor. I > am not a sudoer, so a virtual environment was used: conda, from the Anaconda > Python distribution. The various source codes were extracted and compiled > within the ~/Downloads directory, then installed into the conda environment. > Many of the Python programs are available directly through Anaconda, and > should probably be installed through that package manager rather than from > source. I chose to compile everything to (a) see if I could and (b) ensure > compiler and library consistency. > > The attached build notes indicate the order in which packages were installed, > provide the home- or download-page URL for the software project, and specify > the commands I used to compile and install the code. Installation of Anaconda > is excluded, since I did not have to do it; commands to download, extract, > and cd into the source code are excluded since the latest versions change > almost daily. > > After installation, I executed Daniel Wheeler's test script > (https://gist.github.com/wd15/8717979) to confirm everything worked. This > resulted in a couple of interesting things. First, FiPy as-is compares > versions as strings and believes gmsh 2.10.0 is less than 2.5.0. I revised my > local copy to compare versions as '.'-delimited tuples, which allowed the > code to execute in parallel; I'm working on a patch. Second, the runtimes > increased through np=4, then dropped at np=5 and beyond to values similar to > Dr. Wheeler's (http://wd15.github.io/2014/01/30/fipy-trilinos-anaconda/). The > machine has only four cores and is not hyperthreaded, so this was a surprise. > It could be that timing differences for heavier workloads will follow > expected scaling laws, but I am baffled by the result. > > Results table (run type, MPI rank, grid size, time; modified script at > https://gist.github.com/anonymous/43afb6c567ebfd6d3b2f): > > $ echo -n "serial"; python mpitest.py; echo -n "trilin"; python mpitest.py > --trilinos; for n in {1..6}; do echo -n "np = ${n}"; mpirun -np $n python > mpitest.py --trilinos; done > serial 0 27900 0.134404 > trilin 0 27900 0.275607 > np = 1 0 27900 0.664498 > np = 2 0 14973 2.783230 > 1 14973 2.797370 > np = 3 0 10230 3.167559 > 1 10480 3.174004 > 2 10538 3.196746 > np = 4 0 7958 5.139595 > 1 7971 5.139695 > 2 8106 5.139710 > 3 7955 5.139925 > np = 5 0 6542 0.499731 > 1 6538 0.500179 > 2 6534 0.500313 > 3 6655 0.498517 > 4 6689 0.500652 > np = 6 0 5463 0.447847 > 1 5584 0.448344 > 2 5594 0.445663 > 3 5512 0.449028 > 4 5570 0.446170 > 5 5571 0.449486 > > Thoughts on this anomalous behavior would be greatly appreciated! > > Good luck, > Trevor > > > > <fipy_conda_notes.txt>_______________________________________________ > fipy mailing list > [email protected] > http://www.ctcms.nist.gov/fipy > [ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ] _______________________________________________ fipy mailing list [email protected] http://www.ctcms.nist.gov/fipy [ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ]
