Ibrahim Jarif created CLIMATE-791:
-------------------------------------

             Summary: temporal_subset does not work for month_start > month_end
                 Key: CLIMATE-791
                 URL: https://issues.apache.org/jira/browse/CLIMATE-791
             Project: Apache Open Climate Workbench
          Issue Type: Bug
            Reporter: Ibrahim Jarif
            Assignee: Ibrahim Jarif
            Priority: Minor


The following function does not work as expected when the *month_start* is 
*greater* than *month_end*. The problem occurs with the last year in the 
*target_dataset*.

{code}
def temporal_subset(month_start, month_end, target_dataset, 
average_each_year=False):
    """ Temporally subset data given month_index.

    :param month_start: An integer for beginning month (Jan=1)
    :type month_start: :class:`int`

    :param month_end: An integer for ending month (Jan=1)
    :type month_end: :class:`int`

    :param target_dataset: Dataset object that needs temporal subsetting
    :type target_dataset: Open Climate Workbench Dataset Object

    :param average_each_year: If True, output dataset is averaged for each year
    :type average_each_year: :class:'boolean'

    :returns: A temporal subset OCW Dataset
    :rtype: Open Climate Workbench Dataset Object
    """

    if month_start > month_end:
        month_index = range(month_start,13)
        month_index.extend(range(1, month_end+1))
    else:
        month_index = range(month_start, month_end+1)

    dates = target_dataset.times
    months = np.array([d.month for d in dates])
    time_index = []
    for m_value in month_index:
        time_index = np.append(time_index, np.where(months == m_value)[0])
        if m_value == month_index[0]:
            time_index_first = np.min(np.where(months == m_value)[0])
        if m_value == month_index[-1]:
            time_index_last = np.max(np.where(months == m_value)[0])

    time_index = np.sort(time_index)

    time_index = time_index[np.where((time_index >= time_index_first) & 
(time_index <= time_index_last))]

    time_index = list(time_index)

    new_dataset = ds.Dataset(target_dataset.lats,
                             target_dataset.lons,
                             target_dataset.times[time_index],
                             target_dataset.values[time_index,:],
                             variable=target_dataset.variable,
                             units=target_dataset.units,
                             name=target_dataset.name)

    if average_each_year:
        nmonth = len(month_index)
        ntime  = new_dataset.times.size
        nyear = ntime/nmonth
        averaged_time = []              
        ny, nx = target_dataset.values.shape[1:]
        averaged_values =ma.zeros([nyear, ny, nx])
        for iyear in np.arange(nyear):
            # centered time index of the season between month_start and 
month_end in each year
            center_index = int(nmonth/2)+iyear*nmonth   
            if nmonth == 1:
                center_index = iyear
            averaged_time.append(new_dataset.times[center_index]) 
            averaged_values[iyear,:] = 
ma.average(new_dataset.values[nmonth*iyear:nmonth*iyear+nmonth,:], axis=0)
        new_dataset = ds.Dataset(target_dataset.lats,
                                 target_dataset.lons,
                                 np.array(averaged_time),
                                 averaged_values,
                                 variable=target_dataset.variable,
                                 units=target_dataset.units,
                                 name=target_dataset.name)
    
    return new_dataset
{code}

Example: 
{code}
month_start = 8
month_end = 1
time = [2000,1,1,0,0,
            2000,2,1,0,0,
            2000,3,1,0,0,
            2000,4,1,0,0,
            2000,5,1,0,0,
            2000,6,1,0,0,
            2000,7,1,0,0,
            2000,8,1,0,0,
            2000,9,1,0,0,
            2000,10,1,0,0,
            2000,11,1,0,0,
            2000,12,1,0,0]
{code}

The returned dataset should have 
{code} 
time = [2000,1,1,0,0,
            2000,8,1,0,0,
            2000,9,1,0,0,
            2000,10,1,0,0
            2000,11,1,0,0
            2000,12,1,0,0] 
{code} But it has {code} time = [2000,1,1,0,0] {code} All the other time values 
are removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to