[ 
https://issues.apache.org/jira/browse/CLIMATE-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185777#comment-14185777
 ] 

Ross Laidlaw commented on CLIMATE-341:
--------------------------------------

I've spoken with [~mjoyce] and [~ploikith] to learn about this method.  This is 
my understanding of it:

Given a dataset with values at monthly increments spread over one or more 
complete years, this method will calculate the monthly means for the values.  
It therefore assumes the input is of a specific structure with the number of 
times as multiples of 12.  Within the method, a temporary 4D numpy data 
structure (number of years x 12 x number of latitudes x number of longitudes) 
is created from the 3D numpy input (number of months x number of latitudes x 
number of longitudes).  The numpy 'mean' function is then called on the 4D 
array to produce a 3D (12 x number of latitudes x number of longitudes) result.

For example, if the dataset has four years of monthly data at 100 grid points 
(48 monthly timesteps, 100 latitudes and 100 longitudes), the size of the 
dataset's 3D values array will be 48 x 100 x 100 = 4800.  Within the original 
metric, this values array is copied and the copy is reshaped to a 4 dimensional 
array (4 x 12 x 100 x 100).  The numpy 'mean' function (using axis = 0) is then 
used to calculate the monthly means, returning a (12 x number of latitudes x 
number of longitudes) numpy data structure.

The discussion also drew out the following points:

* This is more of a dataset manipulation than a metric.  It produces an 
intermediate product that can then be used with metrics, for example an anomaly 
calculation metric (by subtracting the means from another set of values)
* This method could therefore be moved to dataset_processor.py (or utils.py as 
a temporary home)
* If the output of the method is a Dataset object containing the means, this 
could then be used with the metrics in the new design/architecture (e.g. 'Bias' 
or similar to calculate anomalies).
* In addition to monthly means, it might be useful to have a daily means 
calculation/option.


Here are some questions based on the above points:
 * If we return a Dataset object from the method, how do we populate the 
'times' field?  This should be a one dimensional array of Python datetime 
objects.  There will be 12 values (one for each month), but I think year and 
day of the month are required when creating datetime objects.  Should we set 
them to the 1st Jan, 1st Feb, etc for an arbitrary year?
* For calculating daily means, how should we deal with leap years?  Perhaps we 
should have a separate method for daily means that can handle Feb 29th / March 
1st indexing of timesteps so it doesn't accidentally mix these together.


Given the above questions, perhaps as an intermediate step we could transfer 
the method over to the utils.py module and output the means array (12 x number 
of latitudes x number of longitudes).

> Refactoring metric "calcAnnualCycleMeans"
> -----------------------------------------
>
>                 Key: CLIMATE-341
>                 URL: https://issues.apache.org/jira/browse/CLIMATE-341
>             Project: Apache Open Climate Workbench
>          Issue Type: Sub-task
>          Components: metrics
>    Affects Versions: 0.3-incubating
>            Reporter: Maziyar Boustani
>            Assignee: Ross Laidlaw
>             Fix For: 0.5
>
>
> Refactoring metric "calcAnnualCycleMeans" from [1] to new metrics [2].
> [1]: 
> https://svn.apache.org/repos/asf/incubator/climate/trunk/rcmet/src/main/python/rcmes/toolkit/metrics.py
> [2]:https://svn.apache.org/repos/asf/incubator/climate/trunk/ocw/metrics.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to