[
https://issues.apache.org/jira/browse/CLIMATE-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743323#comment-13743323
]
Alex Goodman commented on CLIMATE-248:
--------------------------------------
Hi Cam,
Very interesting. Not only do I see the problem stemming from the need to
combine fancy indexing with a loop, but also that masked array arithmetic is
actually somewhat slower than the standard ndarray equivalent. It might be
interesting to determine whether changing that line to:
{code}385 meanstore[i,:,:] = np.average(datam,axis=0){code}
would create a considerable performance increase, even though the result will
obviously be wrong. From my own experience, masked arrays can be up to 3x
slower.
> PERFORMANCE - Rebinning Daily to Monthly datasets takes a really long time
> --------------------------------------------------------------------------
>
> Key: CLIMATE-248
> URL: https://issues.apache.org/jira/browse/CLIMATE-248
> Project: Apache Open Climate Workbench
> Issue Type: Improvement
> Components: regridding
> Affects Versions: 0.1-incubating, 0.2-incubating
> Environment: *nix
> Reporter: Cameron Goodale
> Assignee: Cameron Goodale
> Labels: performance
> Fix For: 0.3-incubating
>
> Attachments: inital_profile.txt, test.py
>
>
> When I was testing the dataset_processor module I noticed that most tests
> would complete in less than 1 second. Then I came across the
> "test_daily_to_monthly_rebin" test and it would take over 2 minutes to
> complete.
> The test initially used a 1x1 degree grid covering the globe and daily time
> step for 2 years (730 days).
> I ran some initial checks and the lag appears to be down in the code where
> the data is rebinned down in '_rcmes_calc_average_on_new_time_unit_K'.
> {code}
> mask = np.zeros_like(data)
> mask[timeunits!=myunit,:,:] = 1.0
> # Calculate missing data mask within each time unit...
> datamask_at_this_timeunit = np.zeros_like(data)
> datamask_at_this_timeunit[:]=
> process.create_mask_using_threshold(data[timeunits==myunit,:,:],threshold=0.75)
> # Store results for masking later
> datamask_store.append(datamask_at_this_timeunit[0])
> # Calculate means for each pixel in this time unit, ignoring
> missing data (using masked array).
> datam =
> ma.masked_array(data,np.logical_or(mask,datamask_at_this_timeunit))
> meanstore[i,:,:] = ma.average(datam,axis=0)
> {code}
> That block is suspect since the rest of the code is doing simple string
> parsing and appending to lists. I don't have the time to do a deep dive into
> this now, and it technically isn't broken, but just really slow.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira