Re: [matplotlib-devel] path simplification with nan (or move_to)

Eric Firing Wed, 08 Oct 2008 01:30:15 -0700

The patch in that last message of mine was clearly not quite right.  I 
have gone through several iterations, and have seemed tantalizingly 
close, but I still don't have it right yet.  I need to leave it alone 
for a while, but I do think it is important to get this working 
correctly ASAP--certainly it is for my own work, at least.


What happens with a nan should be somewhat similar to what happens with 
clipping, so perhaps one could take advantage of part of the clipping 
logic, but I have not looked at this approach closely.

Eric


Eric Firing wrote:
> Michael Droettboom wrote:
>> Eric Firing wrote:
>>> Mike, John,
>>>
>>> Because path simplification does not work with anything but a 
>>> continuous line, it is turned off if there are any nans in the path.  
>>> The result is that if one does this:
>>>
>>> import numpy as np
>>> xx = np.arange(200000)
>>> yy = np.random.rand(200000)
>>> #plot(xx, yy)
>>> yy[1000] = np.nan
>>> plot(xx, yy)
>>>
>>> the plot fails with an incomplete rendering and general 
>>> unresponsiveness; apparently some mysterious agg limit is quietly 
>>> exceeded.
>> The limit in question is "cell_block_limit" in 
>> agg_rasterizer_cells_aa.h.  The relationship between the number 
>> vertices and the number of rasterization cells I suspect depends on 
>> the nature of the values.
>> However, if we want to increase the limit, each "cell_block" is 4096 
>> cells, each with 16 bytes, and currently it maxes out at 1024 cell 
>> blocks, for a total of 67,108,864 bytes.  So, the question is, how 
>> much memory should be devoted to rasterization, when the data set is 
>> large like this?  I think we could safely quadruple this number for a 
>> lot of modern machines, and this maximum won't affect people plotting 
>> smaller data sets, since the memory is dynamically allocated anyway.  
>> It works for me, but I have 4GB RAM here at work.
> 
> It sounds like we have little to lose by increasing the limit as you 
> suggest here.  In addition, it would be nice if hitting that limit 
> triggered an informative exception instead of a puzzling and quiet 
> failure, but maybe that would be hard to arrange.  I have no idea how to 
> approach it.
> 
>>> With or without the nan, this test case also shows the bizarre 
>>> slowness of add_line that I asked about in a message yesterday, and 
>>> that has me completely baffled.
>> lsprofcalltree is my friend!
> 
> Thank you very much for finding that!
> 
>>>
>>> Both of these are major problems for real-world use.
>>>
>>> Do you have any thoughts on timing and strategy for solving this 
>>> problem?  A few weeks ago, when the problem with nans and path 
>>> simplification turned up, I tried to figure out what was going on and 
>>> how to fix it, but I did not get very far.  I could try again, but as 
>>> you know I don't get along well with C++.
>> That simplification code is pretty hairy, particularly because it 
>> tries to avoid a copy by doing everything in an iterator/generator 
>> way.  I think even just supporting MOVETOs there would be tricky, but 
>> probably the easiest first thing.
> 
> The attached patch seems to work, based on cursory testing.  I can make 
> an array of 1M points, salt it with nans, and plot it, complete with 
> gaps, and all in a reasonably snappy fashion, thanks to your units fix.
> 
> I will hold off on committing it until I hear from you or John; or if 
> either of you want to polish and commit it (or an alternative), that's 
> even better.
> 
> Eric
> 
>>>
>>> I am also wondering whether more than straightforward path 
>>> simplification with nan/moveto might be needed.  Suppose there is a 
>>> nightmarish time series with every third point being bad, so it is 
>>> essentially a sequence of 2-point line segments.  The simplest form 
>>> of path simplification fix might be to reset the calculation whenever 
>>> a moveto is encountered, but this would yield no simplification in 
>>> this case.  I assume Agg would still choke. Is there a need for some 
>>> sort of automatic chunking of the rendering operation in addition to 
>>> path simplification?
>>>
>> Chunking is probably something worth looking into (for lines, at 
>> least), as it might also reduce memory usage vs. the "increase the 
>> cell_block_limit" scenario.
>>
>> I also think for the special case of high-resolution time series data, 
>> where x if uniform, there is an opportunity to do something completely 
>> different that should be far faster.  Audio editors (such as 
>> Audacity), draw each column of pixels based on the min/max and/or mean 
>> and/or RMS of the values within that column.  This makes the rendering 
>> extremely fast and simple.  See:
>>
>> http://audacity.sourceforge.net/about/images/audacity-macosx.png
>>
>> Of course, that would mean writing a bunch of new code, but it 
>> shouldn't be incredibly tricky new code.  It could convert the time 
>> series data to an image and plot that, or to a filled polygon whose 
>> vertices are downsampled from the original data.  The latter may be 
>> nicer for Ps/Pdf output.
>>
>> Cheers,
>> Mike
>>
> 
> 
> ------------------------------------------------------------------------
> 
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Matplotlib-devel mailing list
> Matplotlib-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Re: [matplotlib-devel] path simplification with nan (or move_to)

Reply via email to