Re: [Fms-mom4] MOM4 Performance

Reiner Vogelsang Wed, 12 Feb 2003 01:56:48 -0800

Dear Christopher,

although I am not working for the compiler group of SGI I would like to add my two cents
to your problem since a major part of my current work is considering environmental codes.

I modified your routine sub_number_type in the following way:

subroutine sub_number_type ( Hmetric, tau )

type (horiz_metric_type), intent(inout),target :: Hmetric

integer :: tau

integer :: k, kbd=1, ked=2,i,j,is,ie,js,je
real,pointer::p_uh_et(:,:,:),p_uel(:,:,:,:,:),p_dyu(:,:),p_dhu(:,:,:) &
,p_dyte(:,:)
p_uh_et=>Hmetric%uh_et
p_uel=>Hmetric%uel
p_dyu=>Hmetric%dyu
p_dhu=>Hmetric%dhu
p_dyte=>Hmetric%dyte
is=lbound(p_uh_et,1)
ie=ubound(p_uh_et,1)
js=lbound(p_uh_et,2)
je=ubound(p_uh_et,2)

! do k=kbd,ked
! Hmetric%uh_et(:,:,k) = (Hmetric%uel(:,:,k,1,tau)*Hmetric%dyu(:,:)*Hmetric%dhu(:,:,k))/Hmetric%dyte(:,:)
! enddo
! do k=kbd,ked
! p_uh_et(:,:,k) = (p_uel(:,:,k,1,tau)*p_dyu(:,:)*p_dhu(:,:,k))/p_dyte(:,:)
! enddo

do k=kbd,ked
do j=js,je
do i=is,ie
p_uh_et(i,j,k) = (p_uel(i,j,k,1,tau)*p_dyu(i,j)*p_dhu(i,j,k))/p_dyte(i,j)
enddo
enddo
enddo

!Alternative
! call sub_number_raw( p_uel, P_dhu, P_dyu, P_dyte, P_uh_et, tau )

return

end subroutine sub_number_type

The performance degradation of that version is only 5 % with respect to your raw version.
Using array syntax together with the data object Hmetric of derived type horiz_metric_type results in
a performance degradation of 40 - 45 %.

As marked I have left an alternative. That one runs also with a peformance degradation of only 5 %.

Maybe, you can submit my modifications to your personal contact in SGI in order to give
some additional hints.

Best regards
Reiner

Christopher Kerr wrote:

This is a response to the email from Fokke Dijkstra email concerning the
performace of MOM4 .
Most of my work has focused on looking at the difference in performance of
MOM4 with the STATIC and DYNAMIC memory option. This option is not available
in the current resease of MOM4, we anticipate this option to be released
with MOM4 later in Spring, 2003. The MOM4 study showed the following results:
With STATIC_MEMORY the total runtime = 1153.968941 seconds
With DYNAMIC_MEMORY the total runtime = 1877.834289 seconds
which is an approximate 40% difference in performance. The results for the
entire model were obtained on 60 processors running for a 15 day integration.
The above times were obtained on an SGI 3800 System with 600 MHz processors.
We then examined the perfex files for the entire code, they showed the
primary differences were in the decoded loads. These showed that for:
STATIC_MEMORY the decoded loads = 9934669073472I
DYNAMIC_MEMORY the decoded loads = 15047988514480
To report the performance problem to the SGI Compiler Group we needed to
produce a "simple" test case that demonstrates the above behavior. After
some analysis, we constructed such a case. It turns out that the problem is
the result of the code generator working on loop constructs that have array
syntax and derived types that have been allocated with DYNAMIC memory. I
have attached the test case and you can see from the *.w2f output file that
the poorly performing code produces an increase number of temporary arrays
which would account for the performance degradation.
This performance bug has already been forwarded to the SGI Compiler Group
and we are waiting to here back from them. In our studies, we also uncovered
several other loop constructs that performed poorly. They had similar
behaviors to the case described above. Before we proceed with these studies,
we are waiting to hear back from SGI as these may all be part of the same
family of performance bugs.
I did read the email from Fokke Dijkstra and I think he should see an
improvement in MOM4 performance when the above performance fix is made. It is
an unanswered question if the STATIC and DYNAMIC memory options will yield
similar performance improvements on other systems
Fokke Dijkstra did observe an increase by a factor of two the number of
floating point operations. My results shows no increase in this number. After
talking with Matt Harrison, we may want to look at the use of stencil
operators in MOM4 and see if they are contributing to an increase in floating
point operations.
Christopher
Dr. Christopher L. Kerr
Geophysical Fluid Dynamics Laboratory
Forrestal Campus
Princeton University
Princeton, New Jersey 08542
Telephone: (609) 452-6573
Fax:       (609) 987-5063
Email:     [EMAIL PROTECTED]
Web:       http://www.gfdl.gov/~ck
------------------------------------------------------------------------
                              Name: perf_sgi_example
   perf_sgi_example           Type: Unix Tape Archive (application/x-tar)
                          Encoding: base64
                   Download Status: Not downloaded with message

-- 
--------------------------------------------------------------------------------
                                             _                                 
                                           )/___                        _---_  
                                         =_/(___)_-__                  (     ) 
                                        / /\\|/O[]/  \c             O   (   )  
Reiner Vogelsang                        \__/ ----'\__/       ..o o O .o  -_-   
Senior System Engineer


Silicon Graphics GmbH                   Home Office
Am Hochacker 3                              
D-85630 Grasbrunn                       52428 Juelich
Germany

Phone   +49-89-46108-0                  +49-2461-939265
Fax     +49-89-46108-222                +49-2461-939266
Mobile  +49-171-3583208
email    [EMAIL PROTECTED]

Re: [Fms-mom4] MOM4 Performance

Reply via email to