Michael Droettboom wrote:
Eric Firing wrote:
Mike, John,
Because path simplification does not work with anything but a
continuous line, it is turned off if there are any nans in the path.
The result is that if one does this:
import numpy as np
xx = np.arange(200000)
yy = np.random.rand(200000)
#plot(xx, yy)
yy[1000] = np.nan
plot(xx, yy)
the plot fails with an incomplete rendering and general
unresponsiveness; apparently some mysterious agg limit is quietly
exceeded.
The limit in question is "cell_block_limit" in
agg_rasterizer_cells_aa.h. The relationship between the number vertices
and the number of rasterization cells I suspect depends on the nature of
the values.
However, if we want to increase the limit, each "cell_block" is 4096
cells, each with 16 bytes, and currently it maxes out at 1024 cell
blocks, for a total of 67,108,864 bytes. So, the question is, how much
memory should be devoted to rasterization, when the data set is large
like this? I think we could safely quadruple this number for a lot of
modern machines, and this maximum won't affect people plotting smaller
data sets, since the memory is dynamically allocated anyway. It works
for me, but I have 4GB RAM here at work.
It sounds like we have little to lose by increasing the limit as you
suggest here. In addition, it would be nice if hitting that limit
triggered an informative exception instead of a puzzling and quiet
failure, but maybe that would be hard to arrange. I have no idea how to
approach it.
With or without the nan, this test case also shows the bizarre
slowness of add_line that I asked about in a message yesterday, and
that has me completely baffled.
lsprofcalltree is my friend!
Thank you very much for finding that!
Both of these are major problems for real-world use.
Do you have any thoughts on timing and strategy for solving this
problem? A few weeks ago, when the problem with nans and path
simplification turned up, I tried to figure out what was going on and
how to fix it, but I did not get very far. I could try again, but as
you know I don't get along well with C++.
That simplification code is pretty hairy, particularly because it tries
to avoid a copy by doing everything in an iterator/generator way. I
think even just supporting MOVETOs there would be tricky, but probably
the easiest first thing.
The attached patch seems to work, based on cursory testing. I can make
an array of 1M points, salt it with nans, and plot it, complete with
gaps, and all in a reasonably snappy fashion, thanks to your units fix.
I will hold off on committing it until I hear from you or John; or if
either of you want to polish and commit it (or an alternative), that's
even better.
Eric
I am also wondering whether more than straightforward path
simplification with nan/moveto might be needed. Suppose there is a
nightmarish time series with every third point being bad, so it is
essentially a sequence of 2-point line segments. The simplest form of
path simplification fix might be to reset the calculation whenever a
moveto is encountered, but this would yield no simplification in this
case. I assume Agg would still choke. Is there a need for some sort
of automatic chunking of the rendering operation in addition to path
simplification?
Chunking is probably something worth looking into (for lines, at least),
as it might also reduce memory usage vs. the "increase the
cell_block_limit" scenario.
I also think for the special case of high-resolution time series data,
where x if uniform, there is an opportunity to do something completely
different that should be far faster. Audio editors (such as Audacity),
draw each column of pixels based on the min/max and/or mean and/or RMS
of the values within that column. This makes the rendering extremely
fast and simple. See:
http://audacity.sourceforge.net/about/images/audacity-macosx.png
Of course, that would mean writing a bunch of new code, but it shouldn't
be incredibly tricky new code. It could convert the time series data to
an image and plot that, or to a filled polygon whose vertices are
downsampled from the original data. The latter may be nicer for Ps/Pdf
output.
Cheers,
Mike
Index: src/agg_py_path_iterator.h
===================================================================
--- src/agg_py_path_iterator.h (revision 6166)
+++ src/agg_py_path_iterator.h (working copy)
@@ -137,7 +137,8 @@
double width = 0.0, double height = 0.0) :
m_source(&source), m_quantize(quantize), m_simplify(simplify),
m_width(width + 1.0), m_height(height + 1.0), m_queue_read(0), m_queue_write(0),
- m_moveto(true), m_lastx(0.0), m_lasty(0.0), m_clipped(false),
+ m_moveto(true), m_after_moveto(false),
+ m_lastx(0.0), m_lasty(0.0), m_clipped(false),
m_do_clipping(width > 0.0 && height > 0.0),
m_origdx(0.0), m_origdy(0.0),
m_origdNorm2(0.0), m_dnorm2Max(0.0), m_dnorm2Min(0.0),
@@ -205,6 +206,7 @@
*y = front.y;
#if DEBUG_SIMPLIFY
printf((cmd == agg::path_cmd_move_to) ? "|" : "-");
+ printf(" 1 %f %f\n", *x, *y);
#endif
return cmd;
}
@@ -239,8 +241,37 @@
//if we are starting a new path segment, move to the first point
// + init
- if (m_moveto)
+ //printf("x, y, code: %f, %f, %d\n", *x, *y, cmd);
+ if (!m_moveto && cmd == agg::path_cmd_move_to)
{
+ if (m_after_moveto)
+ {
+ #if DEBUG_SIMPLIFY
+ m_skipped++;
+ #endif
+ continue;
+ }
+ m_after_moveto = true;
+ _push(x, y);
+ m_queue[m_queue_write++].set(agg::path_cmd_move_to, *x, *y);
+ _reset();
+ break;
+ }
+ else
+ {
+ m_after_moveto = false;
+ }
+
+ if (m_moveto) // Maybe combine with above.
+ {
+ if (m_after_moveto)
+ {
+ #if DEBUG_SIMPLIFY
+ m_skipped++;
+ #endif
+ continue;
+ }
+ m_after_moveto = true;
m_lastx = *x;
m_lasty = *y;
m_moveto = false;
@@ -248,6 +279,8 @@
#if DEBUG_SIMPLIFY
m_pushed++;
printf("|");
+ printf(" 2 %f %f\n", *x, *y);
+
#endif
return agg::path_cmd_move_to;
}
@@ -376,7 +409,6 @@
#endif
continue;
}
-
//if we get here, then this vector was not similar enough to the
//line we are building, so we need to draw that line and start the
//next one.
@@ -384,46 +416,9 @@
//if the line needs to extend in the opposite direction from the
//direction we are drawing in, move back to we start drawing from
//back there.
- if (m_haveMin)
- {
- m_queue[m_queue_write++].set(agg::path_cmd_line_to, m_minX, m_minY);
- }
- m_queue[m_queue_write++].set(agg::path_cmd_line_to, m_maxX, m_maxY);
- //if we clipped some segments between this line and the next line
- //we are starting, we also need to move to the last point.
- if (m_clipped) {
- m_queue[m_queue_write++].set(agg::path_cmd_move_to, m_lastx, m_lasty);
- }
- else if (!m_lastMax)
- {
- //if the last line was not the longest line, then move back to
- //the end point of the last line in the sequence. Only do this
- //if not clipped, since in that case lastx,lasty is not part of
- //the line just drawn.
+ _push(x, y);
- //Would be move_to if not for the artifacts
- m_queue[m_queue_write++].set(agg::path_cmd_line_to, m_lastx, m_lasty);
- }
-
- //now reset all the variables to get ready for the next line
- m_origdx = *x - m_lastx;
- m_origdy = *y - m_lasty;
- m_origdNorm2 = m_origdx*m_origdx + m_origdy*m_origdy;
-
- m_dnorm2Max = m_origdNorm2;
- m_dnorm2Min = 0.0;
- m_haveMin = false;
- m_lastMax = true;
- m_lastx = m_maxX = *x;
- m_lasty = m_maxY = *y;
- m_lastWrittenX = m_minX = m_lastx;
- m_lastWrittenY = m_minY = m_lasty;
-
- m_clipped = false;
-#if DEBUG_SIMPLIFY
- m_pushed += m_queue_write - m_queue_read;
-#endif
break;
}
@@ -453,6 +448,8 @@
*y = front.y;
#if DEBUG_SIMPLIFY
printf((cmd == agg::path_cmd_move_to) ? "|" : "-");
+ printf(" 3 %f %f\n", *x, *y);
+
#endif
return cmd;
}
@@ -489,6 +486,7 @@
item m_queue[6];
bool m_moveto;
+ bool m_after_moveto;
double m_lastx, m_lasty;
bool m_clipped;
bool m_do_clipping;
@@ -512,6 +510,72 @@
unsigned m_pushed;
unsigned m_skipped;
#endif
+
+ void _reset(void)
+ {
+ m_lastx = (0.0);
+ m_lasty = (0.0);
+ m_clipped = (false);
+ m_origdx = (0.0);
+ m_origdy = (0.0);
+ m_origdNorm2 = (0.0);
+ m_dnorm2Max = (0.0);
+ m_dnorm2Min = (0.0);
+ m_haveMin = (false);
+ m_lastMax = (false);
+ m_maxX = (0.0);
+ m_maxY = (0.0);
+ m_minX = (0.0);
+ m_minY = (0.0);
+ m_lastWrittenX = (0.0);
+ m_lastWrittenY = (0.0);
+ }
+
+ void _push(double* x, double* y)
+ {
+ if (m_haveMin)
+ {
+ m_queue[m_queue_write++].set(agg::path_cmd_line_to, m_minX, m_minY);
+ }
+ m_queue[m_queue_write++].set(agg::path_cmd_line_to, m_maxX, m_maxY);
+
+ //if we clipped some segments between this line and the next line
+ //we are starting, we also need to move to the last point.
+ if (m_clipped) {
+ m_queue[m_queue_write++].set(agg::path_cmd_move_to, m_lastx, m_lasty);
+ }
+ else if (!m_lastMax)
+ {
+ //if the last line was not the longest line, then move back to
+ //the end point of the last line in the sequence. Only do this
+ //if not clipped, since in that case lastx,lasty is not part of
+ //the line just drawn.
+
+ //Would be move_to if not for the artifacts
+ m_queue[m_queue_write++].set(agg::path_cmd_line_to, m_lastx, m_lasty);
+ }
+
+ //now reset all the variables to get ready for the next line
+ m_origdx = *x - m_lastx;
+ m_origdy = *y - m_lasty;
+ m_origdNorm2 = m_origdx*m_origdx + m_origdy*m_origdy;
+
+ m_dnorm2Max = m_origdNorm2;
+ m_dnorm2Min = 0.0;
+ m_haveMin = false;
+ m_lastMax = true;
+ m_lastx = m_maxX = *x;
+ m_lasty = m_maxY = *y;
+ m_lastWrittenX = m_minX = m_lastx;
+ m_lastWrittenY = m_minY = m_lasty;
+
+ m_clipped = false;
+#if DEBUG_SIMPLIFY
+ m_pushed += m_queue_write - m_queue_read;
+#endif
+
+ }
+
};
#endif // __AGG_PY_PATH_ITERATOR_H__
Index: lib/matplotlib/path.py
===================================================================
--- lib/matplotlib/path.py (revision 6166)
+++ lib/matplotlib/path.py (working copy)
@@ -110,7 +110,7 @@
assert vertices.shape[1] == 2
self.should_simplify = (codes is None and
- np.all(np.isfinite(vertices)) and
+ ## np.all(np.isfinite(vertices)) and
len(vertices) >= 128)
self.codes = codes
self.vertices = vertices
Index: examples/pylab_examples/clippedline.py
===================================================================
--- examples/pylab_examples/clippedline.py (revision 6166)
+++ examples/pylab_examples/clippedline.py (working copy)
@@ -19,7 +19,7 @@
def set_data(self, *args, **kwargs):
Line2D.set_data(self, *args, **kwargs)
- if self._invalid:
+ if self._invalid:
self.recache()
self.xorig = np.array(self._x)
self.yorig = np.array(self._y)
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel