[jira] [Updated] (CLIMATE-88) Performance improvements for metrics.py

Alex Goodman (JIRA) Fri, 07 Jun 2013 10:21:35 -0700

     [ 
https://issues.apache.org/jira/browse/CLIMATE-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alex Goodman updated CLIMATE-88:
--------------------------------

    Description: 
The performance for many of the functions in metrics.py can be improved by 
removing some unnecessary loops and utilizing shape manipulation. 

I have attached some example code that benchmarks the differences between the 
original and proposed methodology if you did not see it in the mailing list.


  was:
The performance for many of the functions in metrics.py can be improved by 
removing some unnecessary loops and utilizing shape manipulation. 

Here is some example code that benchmarks the differences between the original 
and proposed methodology if you did not see it in the mailing list:

# Test script comparing two different methodologies for determining monthly 
climatology.
# Author: Alex Goodman

from timeit import Timer
import os
import numpy as np

def calcAnnualCycleStdev(dataset1, times):
    '''
     Purpose:: 
        Calculate monthly standard deviations for every grid point
     Input::
        dataset1 - 3d numpy array of data in (12* number of years,lat,lon) 
        times - an array of python datetime objects
     Output:: 
        stds - if 3d numpy was entered, 3d (12,lat,lon)
    '''
    # Extract months from time variable
    months = times
    
    # empty array to store means
    stds = np.empty((12, dataset1.shape[1], dataset1.shape[2]))
    
    # Calculate sample standard deviation month by month (January - December)
    for i in np.arange(12):
        stds[i, :, :] = dataset1[months == i+1, :, :].std(axis = 0, ddof = 1)
        
    return stds
    
def calcAnnualCycleStdev2(dataset1, times):
    '''
     Purpose:: 
        Calculate monthly standard deviations for every grid point
     Input::
        dataset1 - 3d numpy array of data in (12* number of years,lat,lon) 
        times - an array of python datetime objects
     Output:: 
        stds - if 3d numpy was entered, 3d (12,lat,lon)
    '''
    # Extract months from time variable
    months = times
    nMonth, nGrdY, nGrdX = dataset1.shape

    # Find the std month by month as before, but this time change the
    # shape of the input array instead of a loop
    dataset1.shape = nMonth/12, 12, nGrdY, nGrdX
    stds = dataset1.std(axis = 0, ddof = 1)

    # Since numpy arrays are treated as shallow copies when passed into a
    # function, the shape must be reset as to insure that dataset1's
    # original shape is preserved after this function is called
    dataset1.shape = nMonth, nGrdX, nGrdY
    
    return stds
    
def main():
    nyrs = [5, 10, 20, 50, 100]
    for nyr in nyrs:
        print '---Starting test for %d years of data---' %(nyr)
        # Generate a random numpy array for our benchmark
        data = np.random.rand(12*nyr, 500, 500) 
        month = np.tile(np.arange(1,13), nyr)
        print 'Testing original function...'
        t = Timer(lambda: calcAnnualCycleStdev(data, month))
        t1 = t.timeit(number=1)
        print 'Approximate runtime: %1.2f s' %(t1)
        
        # Test revised function
        print 'Testing revised function...'
        t = Timer(lambda: calcAnnualCycleStdev2(data, month))
        t2 = t.timeit(number=1)
        print 'Approximate runtime: %1.2f s' %(t2)
            

if __name__ == '__main__':
    main()


    
> Performance improvements for metrics.py
> ---------------------------------------
>
>                 Key: CLIMATE-88
>                 URL: https://issues.apache.org/jira/browse/CLIMATE-88
>             Project: Apache Open Climate Workbench
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 0.1-incubating
>            Reporter: Alex Goodman
>            Assignee: Chris A. Mattmann
>             Fix For: 0.1-incubating
>
>         Attachments: benchmark_metrics.py
>
>
> The performance for many of the functions in metrics.py can be improved by 
> removing some unnecessary loops and utilizing shape manipulation. 
> I have attached some example code that benchmarks the differences between the 
> original and proposed methodology if you did not see it in the mailing list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CLIMATE-88) Performance improvements for metrics.py

Reply via email to