Hi I don't have any kind of record that would be useful. >From memory a few cases where this happened:
* Within R users are loading the module and doing install.packages(....) on a random node, later they want to use that package on a script and it gives "random" errors with illegal instruction depending on where it ran. -- User feedback is that this "never happens" with an RPM based R install when they put packages in their own R user libs. I know this is not directly EB related still this is the user feedback I get. * We have 3 geographically distributed sites and would like to be able to compile once and then rsync to the others. We are still running into problems if we aren't carefully selecting the host with the least features. Where after doing the rsync software simply bails out. We are in the process of investigating this. It seems OpenBLAS related which looksl ike its doing some kind of optimization on its own. I hope to be able to look at what EPEL or Debian do to get rid of this. We did have EASYBUILD_OPTARCH="" for some time in the config file and have now, since ~EB 2.8.2, switched to EASYBUILD_OPTARCH="GENERIC". I had the impression that the, in the past, the empty string was the value for "just don't optimize". I might have been wrong :) One thing I want to make clear: The "constant problem" is bitching and whining on a very high level. We came from a basically unmanaged installation where people installed in any way they saw fit and kept complaining about the pain of finding out how to redo installations. All of this is sovled. But as soon as someone solves a good part of the actual problem people (that would be me) get greedy and want everything to work magically. Again: My world became a better place because of EB and definitely do NOT want to go back to where I was before :) /Martin On Tue, Oct 4, 2016 at 10:06 AM Kenneth Hoste <kenneth.ho...@ugent.be> wrote: Hi Martin, On 30/09/16 18:55, Martin wrote: I think this is a recurring question. My impression is that "HPC" seems to somehow imply that there's a divide in people. With quite some exxageration: One group wants to squeeze out every possible CPU cycle and in turn is willing to invest the time to recompile multiple times, even within the same CPU Family (like here Xeon v1, v2, ..., v5) The second group (like me) would like to simply have repeatable builds. I'd rather prefer compiling against Pentium I but have reliable builds that will run on all the hardware that is floating around. Is this something that should be discussed more broadly? Maybe whether testing should include different sets of CPU Flags? I'm pretty sure Kenneth (or UGent) can't pull this off. Something where volunteers could maybe provide build slaves or similiar. I'm definitely having the constant problem that some easyconfigs seem to work very well for most people. When I do a rollout at my current client it works only on half the nodes due to illegal instruction errors. We tend to test things with a default configuration (well, except for using Lmod as modules tool, which will become the default soon), so we optimize for the architecture we're building on. Do you have examples of stuff that fails with illegal instruction errors? Where are you building the software to be used anywhere, and using which configuration? Are you using --optarch=GENERIC? cfr. http://easybuild.readthedocs.io/en/latest/Controlling_compiler_optimization_flags.html?highlight=GENERIC#optimizing-for-a-generic-processor-architecture-via-optarch-generic regards, Kenneth /Martin On Fri, Sep 30, 2016 at 11:09 AM Jack Perdue <j-per...@tamu.edu> wrote: I like Pablo's trick of checking arch before building. Good idea in my book. I do something similar here (as EasyBuild-ada-Westmere) [attached]. jack On 09/29/2016 03:00 AM, Pablo Escobar Lopez wrote: > I use a similar solution. I have /soft/apps/arch1 > /soft/apps/arch2..../soft/apps/archN in a nfs server and I use autofs > to mount the right folder in /soft/apps in each compute node. This way > the path to access the software stack is the same in every machine > (/soft/apps) but each machines uses the right one for its arch. To do > this I build each new application in different machines with the > different cpu types which already have the /soft/apps folder mounted. > I use a flat naming scheme. > > I also have /soft/apps/generic which is a "special" software stack > built with "eb --optarch=GENERIC" so when I need to make the software > stack available in a non compute node (e.g a web server) I use the > generic software stack and I don't need to worry about which specific > cpu I have in this machine. > > Another trick I do is that my easybuild module (which is in the > arch-independent folder so it's the same everywhere) includes some > alias. E.g. eb-ivybridge="eb > --configfiles=/soft/apps/easybuild-config-files/ivybridge.cfg". I keep > the different easybuild config files in the arch specific folders > (e.g. ivibridge.cfg is only accesible in > /soft/apps/ivybrige/easybuild-config-files/ivybridge.cfg) this way if > by mistake I do "eb-nehalem" when I am logged in a ivybridge machine I > get an error "file nehalem.cfg not found" > > about the way to figure out the arch in the current machine I am not > aware of any easy way. What I would do is adding a file > /etc/profile.d/cpu_arch.sh which defines a environment variable like > "cpu_arch=ivybridge". You can define a variable in your config > management tool which can be used to generate this profile file. You > can also use this variable to define the right software stack to mount > in the machine. > > regards, > Pablo. > > > > 2016-09-29 9:14 GMT+02:00 Åke Sandgren <ake.sandg...@hpc2n.umu.se > <mailto:ake.sandg...@hpc2n.umu.se>>: > > What we are aiming for is a combination. > > The achitecture independent code (like EasyBuild itself, intel and > portland compilers and similar) are installed under /eb/common/.... > The arch dependent packages are under /eb/opt/... which is mounted per > client architecture/OS distro version, from the server file system. > > Actually /eb/common is also mounted depending on OS distribution. > > One then needs two module use lines, one for /eb/common/modules > and one > for /eb/opt/modules. > > We keep the downloaded source dir (Easybuild "sourcepath") common for > all to minimize repeating downloads. > > However, I'm not sure this will work with a hierarchial module layout. > Haven't tried it yet. > > On 09/29/2016 08:38 AM, Ole Holm Nielsen wrote: > > We're installing EasyBuild 2.9.0 on our new cluster, and user > > application codes should be built on top of the latest foss-2016 > toolchain. > > > > Since our cluster has 4 different generations of Intel Xeon hardware > > (Nehalem, Sandy/Ivy Bridge, Haswell, Broadwell), users want to > compile > > their application codes optimized for the compute node hardware (gcc > > -march=native). > > > > Question: Is there a best practices method for providing EasyBuild > > modules which are compiled for different hardware architectures? > > > > I can see some possible solutions: > > > > 1. Build totally separate and complete EB module trees for each > of the > > architectures. Mount the correct EB module tree by NFS on the same > > mount point (/home/modules, say) on compute nodes based upon its > > architecture. > > > > 2. Within a single EB hierarchy, build multiple versions of just the > > application code modules requiring optimized code. This has the > > advantage of a shared module tree for all non-application-code > modules. > > Users must then identify the compute node's architecture at > run-time and > > load the correct module for that architecture (this sounds > complicated > > for users). > > > > > > Bonus question: What's the most portable, reliable and > lightweight way > > to determine which Intel Xeon architecture you're working on? > Googling > > the question suggests using /proc/cpuinfo: > > > > # grep "model name" /proc/cpuinfo | head -1 > > model name : Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz > > > > but then you must parse this string to discover whether you're > on Xeon, > > Xeon v2, v3, v4, or v5 (from next year). > > > > Thanks for sharing your insights, > > Ole > > -- > Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden > Internet: a...@hpc2n.umu.se <mailto:a...@hpc2n.umu.se> Phone: +46 > 90 7866134 <tel:%2B46%2090%207866134> <%2B46%2090%207866134> Fax: +46 90-580 14 > <tel:%2B46%2090-580%2014> <%2B46%2090-580%2014> > Mobile: +46 70 7716134 <tel:%2B46%2070%207716134> <%2B46%2070%207716134> WWW: > http://www.hpc2n.umu.se > < https://urldefense.proofpoint.com/v2/url?u=http-3A__www.hpc2n.umu.se&d=CwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=yuyoBkmTkIQPbv1BTF9U27ww5Lm7GhsMmWcQG9gmjbA&m=a5ukZbmBUY5AcMEALo4m7s1vW2K6C3zq5B0JwyRCzjU&s=eVsGYnVaT5S4FTalSkN493kXd1u6_zbkTRCkC_c060k&e= > > > > > > -- > Pablo Escobar López > HPC systems engineer > sciCORE, University of Basel > SIB Swiss Institute of Bioinformatics > http://scicore.unibas.ch > < https://urldefense.proofpoint.com/v2/url?u=http-3A__scicore.unibas.ch&d=CwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=yuyoBkmTkIQPbv1BTF9U27ww5Lm7GhsMmWcQG9gmjbA&m=a5ukZbmBUY5AcMEALo4m7s1vW2K6C3zq5B0JwyRCzjU&s=R3wfs3ueWncvMZeELeqXU7MfYfERz5Q37aJlfIvKDE4&e= > -- -- http://www.xing.com/profile/Martin_Marcher http://www.linkedin.com/in/martinmarcher Mobil: +43 / 660 / 62 45 103 UID: ATU68801424 -- -- http://www.xing.com/profile/Martin_Marcher http://www.linkedin.com/in/martinmarcher Mobil: +43 / 660 / 62 45 103 UID: ATU68801424