Re: [matplotlib-devel] low-hanging fruit on the transforms branch

Eric Firing Tue, 06 Nov 2007 11:08:54 -0800

Mike,

Thanks for the quick response.


I was wrong as usual: the masked array overhead in your original version 
of the path initializer was actually small.  I misinterpreted the 
kcachegrind display.  Rats!  I was hoping for a big gain.  It looks like 
anything that makes a huge number of paths is going to be slow, no 
matter what we do to try to optimize the path initializer.

A partial solution for pcolor is pcolormesh, using the quadmesh 
extension code, although that still has a bug. (Paul Kienzle was going 
to look into it.) Is the quadmesh extension compatible with your 
transforms branch?

My impression is that the transforms branch is going to be a big step 
forward, with performance improvements in some areas, at worst minor 
penalties in others--except for some problems like pcolor that need to 
be solved.

In order to replace matlab in my application, a very fast interactive 
pcolor-type capability is absolutely essential.  I think this simply has 
to be done via extension code, like quadmesh and the image codes. 
(Pcolor in the trunk isn't fast enough, either.)  Unfortunately, I have 
found those codes hard to understand.  Only the regular-grid image code 
is fully integrated into the trunk, and even it has a long-standing bug 
revealed by extreme zooming.  The irregular-grid image routine might be 
a big help, but it has never been integrated.  I don't remember which 
bugs it shares with quadmesh and image, if any.

Eric



Michael Droettboom wrote:
> Eric Firing wrote:
>> Mike,
>>
>> I made a quick test and took a quick look, and I certainly see a ripe 
>> mango within reach.  I don't know what your constraints and strategy 
>> are, but I thought I would give you the off-the-cuff idea before I 
>> forget what I did.
>>
>> The test was pcolortest.py, and the kcachegrind input is the .log file.
>>
>> The problem is the path initializer: it is converting everything to a 
>> masked array, which in the vast majority of cases is not needed, and is 
>> very costly.
> 
> Thanks for finding this.  I agree completely.  I think that was 
> basically a typo that ended up "working", just suboptimally.  The input 
> to the path constructor may be either a numpy array, an ma array or a 
> regular Python sequence.  If it's the first two, it should be left alone 
> (if there is an array mask, it is dealt with later on in the 
> constructor), but if the latter, it should be converted to a numpy array.
> 
> What I meant to type was:
> 
>          if not ma.isMaskedArray(vertices):
>              vertices = npy.asarray(vertices, npy.float_)
> 
> The argument against just "npy.asarray(vertices, npy.float_)" is that 
> the mask needs to be preserved.
> 
> If I understand correctly, that will be essentially a no-op when the 
> input is a numpy array, albeit with the overhead of some checks.
> 
>> We need to think carefully about the levels of API, and what should be 
>> done at which levels.  One possibility is that at the level of the path 
>> initializer, only ordinary ndarrays should be acceptable--any mask 
>> manipulations and compressions should already have been done.  This 
>> would require a helper function to generate the codes for that case. 
>> Another is that the path initializer could get a flag telling it whether 
>> to check for masked arrays.  And another is that a check for existance 
>> of a mask should be done at the start, and the mask processing done only 
>> if there is a mask.
> 
> This option was the intent.
> 
>> Yet another is that if a mask is needed, it be 
>> passed in as an optional 1-D array kwarg.  An advantage of this is that 
>> the code that calls the path initializer may be in a better position to 
>> know what is needed to generate the 1-D mask (that is, a mask for each 
>> (x,y) point rather than for x and y separately)--that mask may already 
>> be sitting around.
> 
> Many of these options I fear would significantly complicate the code. 
> One of the driving motivations for the refactoring is to allow 
> transformations to be combined more generally.  Think of the case where 
> you have a polar plot with a logarithmic scale on the r-axis (this 
> wasn't ever possible in the trunk).  The log scale means that there is 
> potential for negative masked values, but the polar part of the 
> transformation shouldn't have to know or care whether masked values are 
> being passed through.  Requiring it to do so would need the same checks 
> currently performed in the Path constructor, but they would be copied 
> all over the code in every kind of new transformation.
> 
> FWIW, there already is a deliberate "quarantining" of masked arrays -- 
> it happens where the logical elements of the plot hit the drawing 
> commands of the plot (the Path object).  It could have been implemented 
> such that the backends must understand masked arrays and draw 
> accordingly, but it proved to be faster (based on the simple_plot_fps.py 
> benchmark) to convert to a non-masked array with MOVETO codes upfront 
> and reuse that.  (Not surprising, given the overhead of masked arrays). 
>   This means that masked arrays are not used at all during panning and 
> zooming operations where speed is perhaps the most crucial.
> 
>> Masked arrays are pretty clunky and slow.  The maskedarray 
>> implementation by Pierre GM is nicer, more complete, and faster for many 
>> operations than numpy.ma, but it still adds a lot of overhead, 
>> especially for small arrays.  (It needs to have its core in C; so far I 
>> have failed dismally in trying to understand how to do that without 
>> repeating the bulk of the ndarray code.)
>>
>> A related point: can you (or is it OK if I do it) change all the "import 
>> numpy.ma as ma" or whatever to "from matplotlib.numerix import npyma as 
>> ma"?  The advantage is that it makes it easy to test the new version 
>> with either maskedarray or ma.  This should be temporary; I am still 
>> hoping and expecting that maskedarray will replace ma in the core numpy 
>> distribution.
> 
> That sounds like a very good idea.  I'll go ahead and do this (on the 
> branch only).
> 
> Cheers,
> Mike
> 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Re: [matplotlib-devel] low-hanging fruit on the transforms branch

Reply via email to