[ 
https://issues.apache.org/jira/browse/CLIMATE-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Goodman updated CLIMATE-88:
--------------------------------

    Attachment: benchmark_metrics.py
    
> Performance improvements for metrics.py
> ---------------------------------------
>
>                 Key: CLIMATE-88
>                 URL: https://issues.apache.org/jira/browse/CLIMATE-88
>             Project: Apache Open Climate Workbench
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 0.1-incubating
>            Reporter: Alex Goodman
>            Assignee: Chris A. Mattmann
>             Fix For: 0.1-incubating
>
>         Attachments: benchmark_metrics.py
>
>
> The performance for many of the functions in metrics.py can be improved by 
> removing some unnecessary loops and utilizing shape manipulation. 
> Here is some example code that benchmarks the differences between the 
> original and proposed methodology if you did not see it in the mailing list:
> # Test script comparing two different methodologies for determining monthly 
> climatology.
> # Author: Alex Goodman
> from timeit import Timer
> import os
> import numpy as np
> def calcAnnualCycleStdev(dataset1, times):
>     '''
>      Purpose:: 
>         Calculate monthly standard deviations for every grid point
>      Input::
>         dataset1 - 3d numpy array of data in (12* number of years,lat,lon) 
>         times - an array of python datetime objects
>      Output:: 
>         stds - if 3d numpy was entered, 3d (12,lat,lon)
>     '''
>     # Extract months from time variable
>     months = times
>     
>     # empty array to store means
>     stds = np.empty((12, dataset1.shape[1], dataset1.shape[2]))
>     
>     # Calculate sample standard deviation month by month (January - December)
>     for i in np.arange(12):
>         stds[i, :, :] = dataset1[months == i+1, :, :].std(axis = 0, ddof = 1)
>         
>     return stds
>     
> def calcAnnualCycleStdev2(dataset1, times):
>     '''
>      Purpose:: 
>         Calculate monthly standard deviations for every grid point
>      Input::
>         dataset1 - 3d numpy array of data in (12* number of years,lat,lon) 
>         times - an array of python datetime objects
>      Output:: 
>         stds - if 3d numpy was entered, 3d (12,lat,lon)
>     '''
>     # Extract months from time variable
>     months = times
>     nMonth, nGrdY, nGrdX = dataset1.shape
>     # Find the std month by month as before, but this time change the
>     # shape of the input array instead of a loop
>     dataset1.shape = nMonth/12, 12, nGrdY, nGrdX
>     stds = dataset1.std(axis = 0, ddof = 1)
>     # Since numpy arrays are treated as shallow copies when passed into a
>     # function, the shape must be reset as to insure that dataset1's
>     # original shape is preserved after this function is called
>     dataset1.shape = nMonth, nGrdX, nGrdY
>     
>     return stds
>     
> def main():
>     nyrs = [5, 10, 20, 50, 100]
>     for nyr in nyrs:
>         print '---Starting test for %d years of data---' %(nyr)
>         # Generate a random numpy array for our benchmark
>         data = np.random.rand(12*nyr, 500, 500) 
>         month = np.tile(np.arange(1,13), nyr)
>         print 'Testing original function...'
>         t = Timer(lambda: calcAnnualCycleStdev(data, month))
>         t1 = t.timeit(number=1)
>         print 'Approximate runtime: %1.2f s' %(t1)
>         
>         # Test revised function
>         print 'Testing revised function...'
>         t = Timer(lambda: calcAnnualCycleStdev2(data, month))
>         t2 = t.timeit(number=1)
>         print 'Approximate runtime: %1.2f s' %(t2)
>           
> if __name__ == '__main__':
>     main()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to