Ibrahim Jarif created CLIMATE-791: ------------------------------------- Summary: temporal_subset does not work for month_start > month_end Key: CLIMATE-791 URL: https://issues.apache.org/jira/browse/CLIMATE-791 Project: Apache Open Climate Workbench Issue Type: Bug Reporter: Ibrahim Jarif Assignee: Ibrahim Jarif Priority: Minor
The following function does not work as expected when the *month_start* is *greater* than *month_end*. The problem occurs with the last year in the *target_dataset*. {code} def temporal_subset(month_start, month_end, target_dataset, average_each_year=False): """ Temporally subset data given month_index. :param month_start: An integer for beginning month (Jan=1) :type month_start: :class:`int` :param month_end: An integer for ending month (Jan=1) :type month_end: :class:`int` :param target_dataset: Dataset object that needs temporal subsetting :type target_dataset: Open Climate Workbench Dataset Object :param average_each_year: If True, output dataset is averaged for each year :type average_each_year: :class:'boolean' :returns: A temporal subset OCW Dataset :rtype: Open Climate Workbench Dataset Object """ if month_start > month_end: month_index = range(month_start,13) month_index.extend(range(1, month_end+1)) else: month_index = range(month_start, month_end+1) dates = target_dataset.times months = np.array([d.month for d in dates]) time_index = [] for m_value in month_index: time_index = np.append(time_index, np.where(months == m_value)[0]) if m_value == month_index[0]: time_index_first = np.min(np.where(months == m_value)[0]) if m_value == month_index[-1]: time_index_last = np.max(np.where(months == m_value)[0]) time_index = np.sort(time_index) time_index = time_index[np.where((time_index >= time_index_first) & (time_index <= time_index_last))] time_index = list(time_index) new_dataset = ds.Dataset(target_dataset.lats, target_dataset.lons, target_dataset.times[time_index], target_dataset.values[time_index,:], variable=target_dataset.variable, units=target_dataset.units, name=target_dataset.name) if average_each_year: nmonth = len(month_index) ntime = new_dataset.times.size nyear = ntime/nmonth averaged_time = [] ny, nx = target_dataset.values.shape[1:] averaged_values =ma.zeros([nyear, ny, nx]) for iyear in np.arange(nyear): # centered time index of the season between month_start and month_end in each year center_index = int(nmonth/2)+iyear*nmonth if nmonth == 1: center_index = iyear averaged_time.append(new_dataset.times[center_index]) averaged_values[iyear,:] = ma.average(new_dataset.values[nmonth*iyear:nmonth*iyear+nmonth,:], axis=0) new_dataset = ds.Dataset(target_dataset.lats, target_dataset.lons, np.array(averaged_time), averaged_values, variable=target_dataset.variable, units=target_dataset.units, name=target_dataset.name) return new_dataset {code} Example: {code} month_start = 8 month_end = 1 time = [2000,1,1,0,0, 2000,2,1,0,0, 2000,3,1,0,0, 2000,4,1,0,0, 2000,5,1,0,0, 2000,6,1,0,0, 2000,7,1,0,0, 2000,8,1,0,0, 2000,9,1,0,0, 2000,10,1,0,0, 2000,11,1,0,0, 2000,12,1,0,0] {code} The returned dataset should have {code} time = [2000,1,1,0,0, 2000,8,1,0,0, 2000,9,1,0,0, 2000,10,1,0,0 2000,11,1,0,0 2000,12,1,0,0] {code} But it has {code} time = [2000,1,1,0,0] {code} All the other time values are removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)