Eric Firing wrote:
> Mike, John,
>
> Because path simplification does not work with anything but a 
> continuous line, it is turned off if there are any nans in the path.  
> The result is that if one does this:
>
> import numpy as np
> xx = np.arange(200000)
> yy = np.random.rand(200000)
> #plot(xx, yy)
> yy[1000] = np.nan
> plot(xx, yy)
>
> the plot fails with an incomplete rendering and general 
> unresponsiveness; apparently some mysterious agg limit is quietly 
> exceeded.
The limit in question is "cell_block_limit" in 
agg_rasterizer_cells_aa.h.  The relationship between the number vertices 
and the number of rasterization cells I suspect depends on the nature of 
the values. 

However, if we want to increase the limit, each "cell_block" is 4096 
cells, each with 16 bytes, and currently it maxes out at 1024 cell 
blocks, for a total of 67,108,864 bytes.  So, the question is, how much 
memory should be devoted to rasterization, when the data set is large 
like this?  I think we could safely quadruple this number for a lot of 
modern machines, and this maximum won't affect people plotting smaller 
data sets, since the memory is dynamically allocated anyway.  It works 
for me, but I have 4GB RAM here at work.
> With or without the nan, this test case also shows the bizarre 
> slowness of add_line that I asked about in a message yesterday, and 
> that has me completely baffled.
lsprofcalltree is my friend!
>
> Both of these are major problems for real-world use.
>
> Do you have any thoughts on timing and strategy for solving this 
> problem?  A few weeks ago, when the problem with nans and path 
> simplification turned up, I tried to figure out what was going on and 
> how to fix it, but I did not get very far.  I could try again, but as 
> you know I don't get along well with C++.
That simplification code is pretty hairy, particularly because it tries 
to avoid a copy by doing everything in an iterator/generator way.  I 
think even just supporting MOVETOs there would be tricky, but probably 
the easiest first thing.
>
> I am also wondering whether more than straightforward path 
> simplification with nan/moveto might be needed.  Suppose there is a 
> nightmarish time series with every third point being bad, so it is 
> essentially a sequence of 2-point line segments.  The simplest form of 
> path simplification fix might be to reset the calculation whenever a 
> moveto is encountered, but this would yield no simplification in this 
> case.  I assume Agg would still choke. Is there a need for some sort 
> of automatic chunking of the rendering operation in addition to path 
> simplification?
>
Chunking is probably something worth looking into (for lines, at least), 
as it might also reduce memory usage vs. the "increase the 
cell_block_limit" scenario.

I also think for the special case of high-resolution time series data, 
where x if uniform, there is an opportunity to do something completely 
different that should be far faster.  Audio editors (such as Audacity), 
draw each column of pixels based on the min/max and/or mean and/or RMS 
of the values within that column.  This makes the rendering extremely 
fast and simple.  See:

http://audacity.sourceforge.net/about/images/audacity-macosx.png

Of course, that would mean writing a bunch of new code, but it shouldn't 
be incredibly tricky new code.  It could convert the time series data to 
an image and plot that, or to a filled polygon whose vertices are 
downsampled from the original data.  The latter may be nicer for Ps/Pdf 
output.

Cheers,
Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Reply via email to