The patch in that last message of mine was clearly not quite right. I have gone through several iterations, and have seemed tantalizingly close, but I still don't have it right yet. I need to leave it alone for a while, but I do think it is important to get this working correctly ASAP--certainly it is for my own work, at least.
What happens with a nan should be somewhat similar to what happens with clipping, so perhaps one could take advantage of part of the clipping logic, but I have not looked at this approach closely. Eric Eric Firing wrote: > Michael Droettboom wrote: >> Eric Firing wrote: >>> Mike, John, >>> >>> Because path simplification does not work with anything but a >>> continuous line, it is turned off if there are any nans in the path. >>> The result is that if one does this: >>> >>> import numpy as np >>> xx = np.arange(200000) >>> yy = np.random.rand(200000) >>> #plot(xx, yy) >>> yy[1000] = np.nan >>> plot(xx, yy) >>> >>> the plot fails with an incomplete rendering and general >>> unresponsiveness; apparently some mysterious agg limit is quietly >>> exceeded. >> The limit in question is "cell_block_limit" in >> agg_rasterizer_cells_aa.h. The relationship between the number >> vertices and the number of rasterization cells I suspect depends on >> the nature of the values. >> However, if we want to increase the limit, each "cell_block" is 4096 >> cells, each with 16 bytes, and currently it maxes out at 1024 cell >> blocks, for a total of 67,108,864 bytes. So, the question is, how >> much memory should be devoted to rasterization, when the data set is >> large like this? I think we could safely quadruple this number for a >> lot of modern machines, and this maximum won't affect people plotting >> smaller data sets, since the memory is dynamically allocated anyway. >> It works for me, but I have 4GB RAM here at work. > > It sounds like we have little to lose by increasing the limit as you > suggest here. In addition, it would be nice if hitting that limit > triggered an informative exception instead of a puzzling and quiet > failure, but maybe that would be hard to arrange. I have no idea how to > approach it. > >>> With or without the nan, this test case also shows the bizarre >>> slowness of add_line that I asked about in a message yesterday, and >>> that has me completely baffled. >> lsprofcalltree is my friend! > > Thank you very much for finding that! > >>> >>> Both of these are major problems for real-world use. >>> >>> Do you have any thoughts on timing and strategy for solving this >>> problem? A few weeks ago, when the problem with nans and path >>> simplification turned up, I tried to figure out what was going on and >>> how to fix it, but I did not get very far. I could try again, but as >>> you know I don't get along well with C++. >> That simplification code is pretty hairy, particularly because it >> tries to avoid a copy by doing everything in an iterator/generator >> way. I think even just supporting MOVETOs there would be tricky, but >> probably the easiest first thing. > > The attached patch seems to work, based on cursory testing. I can make > an array of 1M points, salt it with nans, and plot it, complete with > gaps, and all in a reasonably snappy fashion, thanks to your units fix. > > I will hold off on committing it until I hear from you or John; or if > either of you want to polish and commit it (or an alternative), that's > even better. > > Eric > >>> >>> I am also wondering whether more than straightforward path >>> simplification with nan/moveto might be needed. Suppose there is a >>> nightmarish time series with every third point being bad, so it is >>> essentially a sequence of 2-point line segments. The simplest form >>> of path simplification fix might be to reset the calculation whenever >>> a moveto is encountered, but this would yield no simplification in >>> this case. I assume Agg would still choke. Is there a need for some >>> sort of automatic chunking of the rendering operation in addition to >>> path simplification? >>> >> Chunking is probably something worth looking into (for lines, at >> least), as it might also reduce memory usage vs. the "increase the >> cell_block_limit" scenario. >> >> I also think for the special case of high-resolution time series data, >> where x if uniform, there is an opportunity to do something completely >> different that should be far faster. Audio editors (such as >> Audacity), draw each column of pixels based on the min/max and/or mean >> and/or RMS of the values within that column. This makes the rendering >> extremely fast and simple. See: >> >> http://audacity.sourceforge.net/about/images/audacity-macosx.png >> >> Of course, that would mean writing a bunch of new code, but it >> shouldn't be incredibly tricky new code. It could convert the time >> series data to an image and plot that, or to a filled polygon whose >> vertices are downsampled from the original data. The latter may be >> nicer for Ps/Pdf output. >> >> Cheers, >> Mike >> > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > > ------------------------------------------------------------------------ > > _______________________________________________ > Matplotlib-devel mailing list > Matplotlib-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/matplotlib-devel ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel