-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/05/13 14:30, Ralph Castain wrote:
> On May 2, 2013, at 9:18 PM, Christopher Samuel > <sam...@unimelb.edu.au> wrote: > >> We're using Slurm, and it supports them already apparently, so I'm >> not sure if that helps? > > It does - but to be clear: your saying that you can directly launch > processes onto the Phi's via srun? Ah no, Slurm 2.5 supports them as coprocessors, allocated as GPUs are. I've been told Slurm 2.6 (under development) may support them as nodes in their own right, but that's not something I've had time to look into myself (yet). > If so, then this may not be a problem, assuming you can get > confirmation that the Phi's have direct access to the interconnects. I'll see what I can do. There is a long README which will be my light reading on the train home tonight here: http://registrationcenter.intel.com/irc_nas/3047/readme-en.txt This seems to indicate how that works, but other parts imply that it *may* require Intel True Scale InfiniBand adapters: 3.4 Starting Intel(R) MPSS with OFED Support 1) Start the Intel(R) MPSS service. Section 2.3, "Starting Intel(R) MPSS Services" explains how. Do not proceed any further if Intel(R) MPSS is not started. 2) Start IB and HCA services. user_prompt> sudo service openibd start user_prompt> sudo service opensmd start 3) Start The Intel(R) Xeon Phi(TM) coprocessor specific OFED service. user_prompt> sudo service ofed-mic start 4) To start the experimental ccl-proxy service (see /etc/mpxyd.conf) user_prompt> sudo service mpxyd start 3.5 Stopping Intel(R) MPSS with OFED Support o If the installed version is earlier than 2.x.28xx unload the driver using: user_prompt> sudo modprobe -r mic o If the installed version is 2.x.28xx or later, unload the driver using: user_prompt> sudo service ofed-mic stop user_prompt> sudo service mpss stop user_prompt> sudo service mpss unload user_prompt> sudo service opensmd stop user_prompt> sudo service openibd stop o If the experimental ccl-proxy driver was started, unload the driver using: user_prompt> sudo service mpxyd stop > If the answer to both is "yes", then just srun the MPI procs > directly - we support direct launch and use PMI to wireup. Problem > solved :-) That would be ideal, I'll do more digging into Slurm 2.6 (we had planned on starting off with that, but as coprocessors, but this may be enough for us to change). > And yes - that support is indeed in the 1.6 series...just configure > --with-pmi. You may need to provide the path to where pmi.h is > located under the slurm install, but probably not. Brilliant, thanks! All the best, Chris - -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlGDUOMACgkQO2KABBYQAh9lcQCeIp5KjX2PJ/2Cia6fc51hSjFW 26UAn1eKqTqjZil7S8xwJrDDL5wkGof/ =2A67 -----END PGP SIGNATURE-----