Hi
I don't have any kind of record that would be useful.

>From memory a few cases where this happened:

* Within R users are loading the module and doing install.packages(....) on
a random node, later they want to use that package on a script and it gives
"random" errors with illegal instruction depending on where it ran. -- User
feedback is that this "never happens" with an RPM based R install when they
put packages in their own R user libs. I know this is not directly EB
related still this is the user feedback I get.

* We have 3 geographically distributed sites and would like to be able to
compile once and then rsync to the others. We are still running into
problems if we aren't carefully selecting the host with the least features.
Where after doing the rsync software simply bails out. We are in the
process of investigating this. It seems OpenBLAS related which looksl ike
its doing some kind of optimization on its own. I hope to be able to look
at what EPEL or Debian do to get rid of this.

We did have EASYBUILD_OPTARCH="" for some time in the config file and have
now, since ~EB 2.8.2, switched to EASYBUILD_OPTARCH="GENERIC". I had the
impression that the, in the past, the empty string was the value for "just
don't optimize". I might have been wrong :)

One thing I want to make clear: The "constant problem" is  bitching and
whining on a very high level. We came from a basically unmanaged
installation where people installed in any way they saw fit and kept
complaining about the pain of finding out how to redo installations. All of
this is sovled. But as soon as someone solves a good part of the actual
problem people (that would be me) get greedy and want everything to work
magically.

Again: My world became a better place because of EB and definitely do NOT
want to go back to where I was before :)

/Martin

On Tue, Oct 4, 2016 at 10:06 AM Kenneth Hoste <kenneth.ho...@ugent.be>
wrote:

Hi Martin,


On 30/09/16 18:55, Martin wrote:

I think this is a recurring question.
My impression is that "HPC" seems to somehow imply that there's a divide in
people.

With quite some exxageration:
One group wants to squeeze out every possible CPU cycle and in turn is
willing to invest the time to recompile multiple times, even within the
same CPU Family (like here Xeon v1, v2, ..., v5)
The second group (like me) would like to simply have repeatable builds. I'd
rather prefer compiling against Pentium I but have reliable builds that
will run on all the hardware that is floating around.

Is this something that should be discussed more broadly?

Maybe whether testing should include different sets of CPU Flags? I'm
pretty sure Kenneth (or UGent) can't pull this off. Something where
volunteers could maybe provide build slaves or similiar. I'm definitely
having the constant problem that some easyconfigs seem to work very well
for most people. When I do a rollout at my current client it works only on
half the nodes due to illegal instruction errors.


We tend to test things with a default configuration (well, except for using
Lmod as modules tool, which will become the default soon), so we optimize
for the architecture we're building on.

Do you have examples of stuff that fails with illegal instruction errors?

Where are you building the software to be used anywhere, and using which
configuration? Are you using --optarch=GENERIC?
cfr.
http://easybuild.readthedocs.io/en/latest/Controlling_compiler_optimization_flags.html?highlight=GENERIC#optimizing-for-a-generic-processor-architecture-via-optarch-generic


regards,

Kenneth

/Martin

On Fri, Sep 30, 2016 at 11:09 AM Jack Perdue <j-per...@tamu.edu> wrote:

I like Pablo's trick of checking arch before building.  Good idea in my
book.

I do something similar here (as EasyBuild-ada-Westmere) [attached].

jack


On 09/29/2016 03:00 AM, Pablo Escobar Lopez wrote:
> I use a similar solution. I have /soft/apps/arch1
> /soft/apps/arch2..../soft/apps/archN in a nfs server and I use autofs
> to mount the right folder in /soft/apps in each compute node. This way
> the path to access the software stack is the same in every machine
> (/soft/apps) but each machines uses the right one for its arch. To do
> this I build each new application in different machines with the
> different cpu types which already have the /soft/apps folder mounted.
> I use a flat naming scheme.
>
> I also have /soft/apps/generic which is a "special" software stack
> built with "eb --optarch=GENERIC" so when I need to make the software
> stack available in a non compute node (e.g a web server) I use the
> generic software stack and I don't need to worry about which specific
> cpu I have in this machine.
>
> Another trick I do is that my easybuild module (which is in the
> arch-independent folder so it's the same everywhere) includes some
> alias. E.g. eb-ivybridge="eb
> --configfiles=/soft/apps/easybuild-config-files/ivybridge.cfg". I keep
> the different easybuild config files in the arch specific folders
> (e.g. ivibridge.cfg is only accesible in
> /soft/apps/ivybrige/easybuild-config-files/ivybridge.cfg) this way if
> by mistake I do "eb-nehalem" when I am logged in a ivybridge machine I
> get an error "file nehalem.cfg not found"
>
> about the way to figure out the arch in the current machine I am not
> aware of any easy way. What I would do is adding a file
> /etc/profile.d/cpu_arch.sh which defines a environment variable like
> "cpu_arch=ivybridge". You can define a variable in your config
> management tool which can be used to generate this profile file. You
> can also use this variable to define the right software stack to mount
> in the machine.
>
> regards,
> Pablo.
>
>
>
> 2016-09-29 9:14 GMT+02:00 Åke Sandgren <ake.sandg...@hpc2n.umu.se
> <mailto:ake.sandg...@hpc2n.umu.se>>:
>
>     What we are aiming for is a combination.
>
>     The achitecture independent code (like EasyBuild itself, intel and
>     portland compilers and similar) are installed under /eb/common/....
>     The arch dependent packages are under /eb/opt/... which is mounted per
>     client architecture/OS distro version, from the server file system.
>
>     Actually /eb/common is also mounted depending on OS distribution.
>
>     One then needs two module use lines, one for /eb/common/modules
>     and one
>     for /eb/opt/modules.
>
>     We keep the downloaded source dir (Easybuild "sourcepath") common for
>     all to minimize repeating downloads.
>
>     However, I'm not sure this will work with a hierarchial module layout.
>     Haven't tried it yet.
>
>     On 09/29/2016 08:38 AM, Ole Holm Nielsen wrote:
>     > We're installing EasyBuild 2.9.0 on our new cluster, and user
>     > application codes should be built on top of the latest foss-2016
>     toolchain.
>     >
>     > Since our cluster has 4 different generations of Intel Xeon hardware
>     > (Nehalem, Sandy/Ivy Bridge, Haswell, Broadwell), users want to
>     compile
>     > their application codes optimized for the compute node hardware (gcc
>     > -march=native).
>     >
>     > Question: Is there a best practices method for providing EasyBuild
>     > modules which are compiled for different hardware architectures?
>     >
>     > I can see some possible solutions:
>     >
>     > 1. Build totally separate and complete EB module trees for each
>     of the
>     > architectures.  Mount the correct EB module tree by NFS on the same
>     > mount point (/home/modules, say) on compute nodes based upon its
>     > architecture.
>     >
>     > 2. Within a single EB hierarchy, build multiple versions of just the
>     > application code modules requiring optimized code. This has the
>     > advantage of a shared module tree for all non-application-code
>     modules.
>     > Users must then identify the compute node's architecture at
>     run-time and
>     > load the correct module for that architecture (this sounds
>     complicated
>     > for users).
>     >
>     >
>     > Bonus question: What's the most portable, reliable and
>     lightweight way
>     > to determine which Intel Xeon architecture you're working on?
>     Googling
>     > the question suggests using /proc/cpuinfo:
>     >
>     > # grep "model name" /proc/cpuinfo | head -1
>     > model name    : Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
>     >
>     > but then you must parse this string to discover whether you're
>     on Xeon,
>     > Xeon v2, v3, v4, or v5 (from next year).
>     >
>     > Thanks for sharing your insights,
>     > Ole
>
>     --
>     Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
>     Internet: a...@hpc2n.umu.se <mailto:a...@hpc2n.umu.se>  Phone: +46
>     90 7866134 <tel:%2B46%2090%207866134> <%2B46%2090%207866134> Fax: +46
90-580 14
>     <tel:%2B46%2090-580%2014> <%2B46%2090-580%2014>
>     Mobile: +46 70 7716134 <tel:%2B46%2070%207716134>
<%2B46%2070%207716134> WWW:
>     http://www.hpc2n.umu.se
>     <
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.hpc2n.umu.se&d=CwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=yuyoBkmTkIQPbv1BTF9U27ww5Lm7GhsMmWcQG9gmjbA&m=a5ukZbmBUY5AcMEALo4m7s1vW2K6C3zq5B0JwyRCzjU&s=eVsGYnVaT5S4FTalSkN493kXd1u6_zbkTRCkC_c060k&e=
>
>
>
>
>
> --
> Pablo Escobar López
> HPC systems engineer
> sciCORE, University of Basel
> SIB Swiss Institute of Bioinformatics
> http://scicore.unibas.ch
> <
https://urldefense.proofpoint.com/v2/url?u=http-3A__scicore.unibas.ch&d=CwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=yuyoBkmTkIQPbv1BTF9U27ww5Lm7GhsMmWcQG9gmjbA&m=a5ukZbmBUY5AcMEALo4m7s1vW2K6C3zq5B0JwyRCzjU&s=R3wfs3ueWncvMZeELeqXU7MfYfERz5Q37aJlfIvKDE4&e=
>

-- 
-- 
http://www.xing.com/profile/Martin_Marcher
http://www.linkedin.com/in/martinmarcher
Mobil: +43 / 660 / 62 45 103
UID: ATU68801424


-- 
-- 
http://www.xing.com/profile/Martin_Marcher
http://www.linkedin.com/in/martinmarcher
Mobil: +43 / 660 / 62 45 103
UID: ATU68801424

Reply via email to