Re: [matplotlib-devel] low-hanging fruit on the transforms branch

Michael Droettboom Tue, 06 Nov 2007 06:03:46 -0800

Eric Firing wrote:
> Mike,
> 
> I made a quick test and took a quick look, and I certainly see a ripe 
> mango within reach.  I don't know what your constraints and strategy 
> are, but I thought I would give you the off-the-cuff idea before I 
> forget what I did.
> 
> The test was pcolortest.py, and the kcachegrind input is the .log file.
> 
> The problem is the path initializer: it is converting everything to a 
> masked array, which in the vast majority of cases is not needed, and is 
> very costly.


Thanks for finding this.  I agree completely.  I think that was 
basically a typo that ended up "working", just suboptimally.  The input 
to the path constructor may be either a numpy array, an ma array or a 
regular Python sequence.  If it's the first two, it should be left alone 
(if there is an array mask, it is dealt with later on in the 
constructor), but if the latter, it should be converted to a numpy array.

What I meant to type was:

         if not ma.isMaskedArray(vertices):
             vertices = npy.asarray(vertices, npy.float_)

The argument against just "npy.asarray(vertices, npy.float_)" is that 
the mask needs to be preserved.

If I understand correctly, that will be essentially a no-op when the 
input is a numpy array, albeit with the overhead of some checks.

> We need to think carefully about the levels of API, and what should be 
> done at which levels.  One possibility is that at the level of the path 
> initializer, only ordinary ndarrays should be acceptable--any mask 
> manipulations and compressions should already have been done.  This 
> would require a helper function to generate the codes for that case. 
> Another is that the path initializer could get a flag telling it whether 
> to check for masked arrays.  And another is that a check for existance 
> of a mask should be done at the start, and the mask processing done only 
> if there is a mask.

This option was the intent.

> Yet another is that if a mask is needed, it be 
> passed in as an optional 1-D array kwarg.  An advantage of this is that 
> the code that calls the path initializer may be in a better position to 
> know what is needed to generate the 1-D mask (that is, a mask for each 
> (x,y) point rather than for x and y separately)--that mask may already 
> be sitting around.

Many of these options I fear would significantly complicate the code. 
One of the driving motivations for the refactoring is to allow 
transformations to be combined more generally.  Think of the case where 
you have a polar plot with a logarithmic scale on the r-axis (this 
wasn't ever possible in the trunk).  The log scale means that there is 
potential for negative masked values, but the polar part of the 
transformation shouldn't have to know or care whether masked values are 
being passed through.  Requiring it to do so would need the same checks 
currently performed in the Path constructor, but they would be copied 
all over the code in every kind of new transformation.

FWIW, there already is a deliberate "quarantining" of masked arrays -- 
it happens where the logical elements of the plot hit the drawing 
commands of the plot (the Path object).  It could have been implemented 
such that the backends must understand masked arrays and draw 
accordingly, but it proved to be faster (based on the simple_plot_fps.py 
benchmark) to convert to a non-masked array with MOVETO codes upfront 
and reuse that.  (Not surprising, given the overhead of masked arrays). 
  This means that masked arrays are not used at all during panning and 
zooming operations where speed is perhaps the most crucial.

> Masked arrays are pretty clunky and slow.  The maskedarray 
> implementation by Pierre GM is nicer, more complete, and faster for many 
> operations than numpy.ma, but it still adds a lot of overhead, 
> especially for small arrays.  (It needs to have its core in C; so far I 
> have failed dismally in trying to understand how to do that without 
> repeating the bulk of the ndarray code.)
> 
> A related point: can you (or is it OK if I do it) change all the "import 
> numpy.ma as ma" or whatever to "from matplotlib.numerix import npyma as 
> ma"?  The advantage is that it makes it easy to test the new version 
> with either maskedarray or ma.  This should be temporary; I am still 
> hoping and expecting that maskedarray will replace ma in the core numpy 
> distribution.

That sounds like a very good idea.  I'll go ahead and do this (on the 
branch only).

Cheers,
Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Matplotlib-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Re: [matplotlib-devel] low-hanging fruit on the transforms branch

Reply via email to