Eric Firing wrote:
> Michael Droettboom wrote:
>> Eric Firing wrote:
>>> Mike, John,
>>>
>>> Because path simplification does not work with anything but a 
>>> continuous line, it is turned off if there are any nans in the 
>>> path.  The result is that if one does this:
>>>
>>> import numpy as np
>>> xx = np.arange(200000)
>>> yy = np.random.rand(200000)
>>> #plot(xx, yy)
>>> yy[1000] = np.nan
>>> plot(xx, yy)
>>>
>>> the plot fails with an incomplete rendering and general 
>>> unresponsiveness; apparently some mysterious agg limit is quietly 
>>> exceeded.
>> The limit in question is "cell_block_limit" in 
>> agg_rasterizer_cells_aa.h.  The relationship between the number 
>> vertices and the number of rasterization cells I suspect depends on 
>> the nature of the values.
>> However, if we want to increase the limit, each "cell_block" is 4096 
>> cells, each with 16 bytes, and currently it maxes out at 1024 cell 
>> blocks, for a total of 67,108,864 bytes.  So, the question is, how 
>> much memory should be devoted to rasterization, when the data set is 
>> large like this?  I think we could safely quadruple this number for a 
>> lot of modern machines, and this maximum won't affect people plotting 
>> smaller data sets, since the memory is dynamically allocated anyway.  
>> It works for me, but I have 4GB RAM here at work.
>
> It sounds like we have little to lose by increasing the limit as you 
> suggest here.  In addition, it would be nice if hitting that limit 
> triggered an informative exception instead of a puzzling and quiet 
> failure, but maybe that would be hard to arrange.  I have no idea how 
> to approach it.
Agreed.  But also, I'm not sure how to do that. I can see where the 
limit is tested and no more memory is allocated, but not where it shuts 
down drawing after that.  If we can find that point, we should be able 
to throw an exception back to Python somehow.
>
>>> With or without the nan, this test case also shows the bizarre 
>>> slowness of add_line that I asked about in a message yesterday, and 
>>> that has me completely baffled.
>> lsprofcalltree is my friend!
>
> Thank you very much for finding that!
>
>>>
>>> Both of these are major problems for real-world use.
>>>
>>> Do you have any thoughts on timing and strategy for solving this 
>>> problem?  A few weeks ago, when the problem with nans and path 
>>> simplification turned up, I tried to figure out what was going on 
>>> and how to fix it, but I did not get very far.  I could try again, 
>>> but as you know I don't get along well with C++.
>> That simplification code is pretty hairy, particularly because it 
>> tries to avoid a copy by doing everything in an iterator/generator 
>> way.  I think even just supporting MOVETOs there would be tricky, but 
>> probably the easiest first thing.
>
> The attached patch seems to work, based on cursory testing.  I can 
> make an array of 1M points, salt it with nans, and plot it, complete 
> with gaps, and all in a reasonably snappy fashion, thanks to your 
> units fix.
Very nice!  It looks like a nice approach --- though I see from your 
second message that things aren't quite perfect yet.  I, too, feel it's 
close, though.

One possible minor improvement might be to change the "should_simplify" 
expression to be true if codes is not None and contains only LINETO and 
MOVETOs (but not curves, obviously).  I don't imagine a lot of people 
are building up their own paths with MOVETOs in them, but your 
improvement would at least make simplifying those possible.

Mike
>
> Eric
>
>>>
>>> I am also wondering whether more than straightforward path 
>>> simplification with nan/moveto might be needed.  Suppose there is a 
>>> nightmarish time series with every third point being bad, so it is 
>>> essentially a sequence of 2-point line segments.  The simplest form 
>>> of path simplification fix might be to reset the calculation 
>>> whenever a moveto is encountered, but this would yield no 
>>> simplification in this case.  I assume Agg would still choke. Is 
>>> there a need for some sort of automatic chunking of the rendering 
>>> operation in addition to path simplification?
>>>
>> Chunking is probably something worth looking into (for lines, at 
>> least), as it might also reduce memory usage vs. the "increase the 
>> cell_block_limit" scenario.
>>
>> I also think for the special case of high-resolution time series 
>> data, where x if uniform, there is an opportunity to do something 
>> completely different that should be far faster.  Audio editors (such 
>> as Audacity), draw each column of pixels based on the min/max and/or 
>> mean and/or RMS of the values within that column.  This makes the 
>> rendering extremely fast and simple.  See:
>>
>> http://audacity.sourceforge.net/about/images/audacity-macosx.png
>>
>> Of course, that would mean writing a bunch of new code, but it 
>> shouldn't be incredibly tricky new code.  It could convert the time 
>> series data to an image and plot that, or to a filled polygon whose 
>> vertices are downsampled from the original data.  The latter may be 
>> nicer for Ps/Pdf output.
>>
>> Cheers,
>> Mike
>>
>

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Reply via email to