Re: [NMusers] AMD vs Intel

Leonid Gibiansky Tue, 19 Nov 2019 20:11:53 -0800

Thanks to all who shared their experience.

Here is the brief summary of observations:

4 combinations of Intel Fortran or gfortran with Xeon or AMD processors(of approximately the same base frequency) provided similar speed butdifferent results. Time comparison is not straightforward as the numberof iterations required for convergence varied between these 4 versions(FOCEI, LAPLACIAN, and SAEM with ADVAN13 were used for all tests).Results are numerically different, but not really different as parameterestimates differ by no more than the respective confidence intervals ofparameter estimates: few percents for the well defined parameters, morefor parameters with large RSEs. Thus, any of these 4 combinations can beused, but it is better not to mix them in one analysis. Also it seems tobe a good practice to specify not only OS and compiler with options, butalso processor or at least processor type to ensure exactreproducibility of results.

Unlike earlier (10+ years ago) reports, Intel (old, v.11) compiler seemsto provide similar speed on both Intel and AMD new processors.


Thanks!
Leonid



On 11/19/2019 4:32 AM, Rikard Nordgren wrote:

Hi Leonid,
When upgrading from gfortran 4.4.7 to 5.1.1 we ran around 20 models withboth compilers and turning off the -ffast-math. The runs where on thesame hardware. The differences in the parameter estimates and OFV werein general small. One big difference we could see was that the successof the covariance step was seemingly random. It could succeed on onecompiler version, but not the other and it could also start failing whenthe option was turned off. I have kept the runs, so let me know if youwould be interested. I also started some experiments using machinedependent compiler flags, but as our cluster is heterogeneous Iabandoned this testing.
I think that getting identical results could be possible, but that itwould be quite a challenge. There are many components that affect theresults. The compiler, the compiler flags, the libc implementation, thehardware and sometimes the operating system. To see for example wherethe standard libraries comes into play you can do nm nonmem on thenonmem executable (in linux) to list all symbols compiled in. Some arefunction from external libraries, for example my exponential function isfrom libc: exp@@GLIBC_2.2.5 . Even the functions that read in numbersfrom text strings could introduce rounding errors since the textrepresentation is decimal and the internal floating point number is binary.
Best regards,
Rikard Nordgren

--
Rikard Nordgren
Systems developer

Dept of Pharmaceutical Biosciences
Faculty of Pharmacy
Uppsala University
Box 591
75124 Uppsala

Phone: +46 18 4714308
www.farmbio.uu.se/research/researchgroups/pharmacometrics/




On 2019-11-18 23:54, Leonid Gibiansky wrote:
Hi Jeroin,
Thanks for your input, very interesting. As far as the goal isconcerned, I am mostly interested to find options that would giveidentical results on two platform rather than in speed. So far noluck: 4 combinations of gfortran / Intel compilers on Xeon / AMDprocessors give 4 sets of results that are close but not identical.
Related question to the group: have anybody experimented with gfortranoptions (rather than using default provided by Nonmem distribution)?Any recommendations? Same goal: maximum reproducibility acrossdifferent OSs, parallelization options, and processor types.
Thanks
Leonid




On 11/18/2019 5:28 PM, Jeroen Elassaiss-Schaap (PD-value B.V.) wrote:
Hi Leonid,
"A while" back we compared model development trajectories and resultsbetween two computational platforms, Itanium and Xeon, seehttps://www.page-meeting.org/?abstract=1188. The results roughlywere: 1/3 equal, 1/3 rounding differences and 1/3 real differentresults. From discussions with the technical knowledgeable people Iworked with at the time, I recall that there are three levels/sourcesfor those differences:
1) computational (hardware) platform

2) compilers (+ optimization settings)

3) libraries (floating point handling does matter)
Assuming you would like to compare the speed of the platforms wrtNONMEM, my advice would be to test a large series of differentmodels, from simple ADVAN1 or 2 to complex ODE, ranging from FO toLAPLACIAN INT NUMERICAL, while keeping compilers and libraries thesame. Also small and large datasets, as in some instances you mightbe testing only the L1/L2/L3 cache strategies and Turbo settings. Andwith and without parallelization - as that might determine runtimebottlenecks in practice.
Just having a peek at Epyc - seems interesting (noticed results wgcc7.4 compilation). As long as you are able to hold the computationin cache, a big if for the 64-core, there might be an advantage.
All in all I am not sure that it is worth the trouble. For any givenPK-PD model there is a lot you can tune to gain speed, but theoptimal settings might be very different for the next and overruleany platform differences.
Hope this helps,

Jeroen

http://pd-value.com
jer...@pd-value.com
@PD_value
+31 6 23118438
-- More value out of your data!

On 18/11/19 6:34 pm, Leonid Gibiansky wrote:
Thanks Bob and Peter!
The model is quite stable, but this is LAPLACIAN, so requires secondderivatives. At iteration 0, gradients differ by about 50 to 100%between Intel and AMD. This leads to differences in minimizationpath, and slightly different results. Not that different to changethe recommended dose, but sufficiently different to notice (OFdifference of 6 points; 50% more model evaluations to get toconvergence).
Thanks
Leonid



On 11/18/2019 12:15 PM, Bonate, Peter wrote:
Leonid - when you say different. What do you mean? Fixed effectand random effects? Different OFV?
We did a poster at AAPS a decade or so ago comparing results acrossdifferent platforms using the same data and model. We gotdifferent results on the standard errors (which related to matrixinversion and how those are done using software-hardwareconfigurations). And with overparameterized models we got differenterror messages - some platforms converged with no problem whilesome did not converge and gave R matrix singularity.
Did your problems go beyond this?

pete



Peter Bonate, PhD
Executive Director
Pharmacokinetics, Modeling, and Simulation
Astellas
1 Astellas Way, N3.158
Northbrook, IL  60062
peter.bon...@astellas.com
(224) 205-5855



Details are irrelevant in terms of decision making -  Joe Biden.






-----Original Message-----
From: owner-nmus...@globomaxnm.com <owner-nmus...@globomaxnm.com>On Behalf Of Leonid Gibiansky
Sent: Monday, November 18, 2019 11:05 AM
To: nmusers <nmusers@globomaxnm.com>
Subject: [NMusers] AMD vs Intel

Dear All,
I am testing the new Epyc processors from AMD (comparing with IntelXeon), and getting different results. Just wondering whetheranybody faced the problem of differences between AMD and Intelprocessors and knows how to solve it. I am using Intel compiler butready to switch to gfortran or anything else if this would help toget identical results.There were reports of Intel slowing the AMD execution in the past,but in my tests, speed is comparable but the results differ.
Thanks
Leonid
Page Title
När du har kontakt med oss på Uppsala universitet med e-post så innebärdet att vi behandlar dina personuppgifter. För att läsa mer om hur vigör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personaldata. For more information on how this is performed, please read here:http://www.uu.se/en/about-uu/data-protection-policy

Re: [NMusers] AMD vs Intel

Reply via email to