Michael Droettboom wrote:
Eric Firing wrote:
Mike, John,

Because path simplification does not work with anything but a continuous line, it is turned off if there are any nans in the path. The result is that if one does this:

import numpy as np
xx = np.arange(200000)
yy = np.random.rand(200000)
#plot(xx, yy)
yy[1000] = np.nan
plot(xx, yy)

the plot fails with an incomplete rendering and general unresponsiveness; apparently some mysterious agg limit is quietly exceeded.
The limit in question is "cell_block_limit" in agg_rasterizer_cells_aa.h. The relationship between the number vertices and the number of rasterization cells I suspect depends on the nature of the values. However, if we want to increase the limit, each "cell_block" is 4096 cells, each with 16 bytes, and currently it maxes out at 1024 cell blocks, for a total of 67,108,864 bytes. So, the question is, how much memory should be devoted to rasterization, when the data set is large like this? I think we could safely quadruple this number for a lot of modern machines, and this maximum won't affect people plotting smaller data sets, since the memory is dynamically allocated anyway. It works for me, but I have 4GB RAM here at work.

It sounds like we have little to lose by increasing the limit as you suggest here. In addition, it would be nice if hitting that limit triggered an informative exception instead of a puzzling and quiet failure, but maybe that would be hard to arrange. I have no idea how to approach it.

With or without the nan, this test case also shows the bizarre slowness of add_line that I asked about in a message yesterday, and that has me completely baffled.
lsprofcalltree is my friend!

Thank you very much for finding that!


Both of these are major problems for real-world use.

Do you have any thoughts on timing and strategy for solving this problem? A few weeks ago, when the problem with nans and path simplification turned up, I tried to figure out what was going on and how to fix it, but I did not get very far. I could try again, but as you know I don't get along well with C++.
That simplification code is pretty hairy, particularly because it tries to avoid a copy by doing everything in an iterator/generator way. I think even just supporting MOVETOs there would be tricky, but probably the easiest first thing.

The attached patch seems to work, based on cursory testing. I can make an array of 1M points, salt it with nans, and plot it, complete with gaps, and all in a reasonably snappy fashion, thanks to your units fix.

I will hold off on committing it until I hear from you or John; or if either of you want to polish and commit it (or an alternative), that's even better.

Eric


I am also wondering whether more than straightforward path simplification with nan/moveto might be needed. Suppose there is a nightmarish time series with every third point being bad, so it is essentially a sequence of 2-point line segments. The simplest form of path simplification fix might be to reset the calculation whenever a moveto is encountered, but this would yield no simplification in this case. I assume Agg would still choke. Is there a need for some sort of automatic chunking of the rendering operation in addition to path simplification?

Chunking is probably something worth looking into (for lines, at least), as it might also reduce memory usage vs. the "increase the cell_block_limit" scenario.

I also think for the special case of high-resolution time series data, where x if uniform, there is an opportunity to do something completely different that should be far faster. Audio editors (such as Audacity), draw each column of pixels based on the min/max and/or mean and/or RMS of the values within that column. This makes the rendering extremely fast and simple. See:

http://audacity.sourceforge.net/about/images/audacity-macosx.png

Of course, that would mean writing a bunch of new code, but it shouldn't be incredibly tricky new code. It could convert the time series data to an image and plot that, or to a filled polygon whose vertices are downsampled from the original data. The latter may be nicer for Ps/Pdf output.

Cheers,
Mike


Index: src/agg_py_path_iterator.h
===================================================================
--- src/agg_py_path_iterator.h	(revision 6166)
+++ src/agg_py_path_iterator.h	(working copy)
@@ -137,7 +137,8 @@
                  double width = 0.0, double height = 0.0) :
             m_source(&source), m_quantize(quantize), m_simplify(simplify),
             m_width(width + 1.0), m_height(height + 1.0), m_queue_read(0), m_queue_write(0),
-            m_moveto(true), m_lastx(0.0), m_lasty(0.0), m_clipped(false),
+            m_moveto(true), m_after_moveto(false),
+            m_lastx(0.0), m_lasty(0.0), m_clipped(false),
             m_do_clipping(width > 0.0 && height > 0.0),
             m_origdx(0.0), m_origdy(0.0),
             m_origdNorm2(0.0), m_dnorm2Max(0.0), m_dnorm2Min(0.0),
@@ -205,6 +206,7 @@
             *y = front.y;
 #if DEBUG_SIMPLIFY
             printf((cmd == agg::path_cmd_move_to) ? "|" : "-");
+            printf(" 1 %f %f\n", *x, *y);
 #endif
             return cmd;
         }
@@ -239,8 +241,37 @@
 
             //if we are starting a new path segment, move to the first point
             // + init
-            if (m_moveto)
+            //printf("x, y, code: %f, %f, %d\n", *x, *y, cmd);
+            if (!m_moveto && cmd == agg::path_cmd_move_to)
             {
+                if (m_after_moveto)
+                {
+                #if DEBUG_SIMPLIFY
+                    m_skipped++;
+                #endif
+                    continue;
+                }
+                m_after_moveto = true;
+                _push(x, y);
+                m_queue[m_queue_write++].set(agg::path_cmd_move_to, *x, *y);
+                _reset();
+                break;
+            }
+            else
+            {
+                m_after_moveto = false;
+            }
+
+            if (m_moveto)  // Maybe combine with above.
+            {
+                if (m_after_moveto)
+                {
+                #if DEBUG_SIMPLIFY
+                    m_skipped++;
+                #endif
+                    continue;
+                }
+                m_after_moveto = true;
                 m_lastx = *x;
                 m_lasty = *y;
                 m_moveto = false;
@@ -248,6 +279,8 @@
 #if DEBUG_SIMPLIFY
                 m_pushed++;
                 printf("|");
+                printf(" 2 %f %f\n", *x, *y);
+
 #endif
                 return agg::path_cmd_move_to;
             }
@@ -376,7 +409,6 @@
 #endif
                 continue;
             }
-
             //if we get here, then this vector was not similar enough to the
             //line we are building, so we need to draw that line and start the
             //next one.
@@ -384,46 +416,9 @@
             //if the line needs to extend in the opposite direction from the
             //direction we are drawing in, move back to we start drawing from
             //back there.
-            if (m_haveMin)
-            {
-                m_queue[m_queue_write++].set(agg::path_cmd_line_to, m_minX, m_minY);
-            }
-            m_queue[m_queue_write++].set(agg::path_cmd_line_to, m_maxX, m_maxY);
 
-            //if we clipped some segments between this line and the next line
-            //we are starting, we also need to move to the last point.
-            if (m_clipped) {
-                m_queue[m_queue_write++].set(agg::path_cmd_move_to, m_lastx, m_lasty);
-            }
-            else if (!m_lastMax)
-            {
-                //if the last line was not the longest line, then move back to
-                //the end point of the last line in the sequence. Only do this
-                //if not clipped, since in that case lastx,lasty is not part of
-                //the line just drawn.
+            _push(x, y);
 
-                //Would be move_to if not for the artifacts
-                m_queue[m_queue_write++].set(agg::path_cmd_line_to, m_lastx, m_lasty);
-            }
-
-            //now reset all the variables to get ready for the next line
-            m_origdx = *x - m_lastx;
-            m_origdy = *y - m_lasty;
-            m_origdNorm2 = m_origdx*m_origdx + m_origdy*m_origdy;
-
-            m_dnorm2Max = m_origdNorm2;
-            m_dnorm2Min = 0.0;
-            m_haveMin = false;
-            m_lastMax = true;
-            m_lastx = m_maxX = *x;
-            m_lasty = m_maxY = *y;
-            m_lastWrittenX = m_minX = m_lastx;
-            m_lastWrittenY = m_minY = m_lasty;
-
-            m_clipped = false;
-#if DEBUG_SIMPLIFY
-            m_pushed += m_queue_write - m_queue_read;
-#endif
             break;
         }
 
@@ -453,6 +448,8 @@
             *y = front.y;
 #if DEBUG_SIMPLIFY
             printf((cmd == agg::path_cmd_move_to) ? "|" : "-");
+            printf(" 3 %f %f\n", *x, *y);
+
 #endif
             return cmd;
         }
@@ -489,6 +486,7 @@
     item m_queue[6];
 
     bool m_moveto;
+    bool m_after_moveto;
     double m_lastx, m_lasty;
     bool m_clipped;
     bool m_do_clipping;
@@ -512,6 +510,72 @@
     unsigned m_pushed;
     unsigned m_skipped;
 #endif
+
+    void _reset(void)
+    {
+        m_lastx = (0.0);
+        m_lasty = (0.0);
+        m_clipped = (false);
+        m_origdx = (0.0);
+        m_origdy = (0.0);
+        m_origdNorm2 = (0.0);
+        m_dnorm2Max = (0.0);
+        m_dnorm2Min = (0.0);
+        m_haveMin = (false);
+        m_lastMax = (false);
+        m_maxX = (0.0);
+        m_maxY = (0.0);
+        m_minX = (0.0);
+        m_minY = (0.0);
+        m_lastWrittenX = (0.0);
+        m_lastWrittenY = (0.0);
+    }
+
+    void _push(double* x, double* y)
+    {
+            if (m_haveMin)
+            {
+                m_queue[m_queue_write++].set(agg::path_cmd_line_to, m_minX, m_minY);
+            }
+            m_queue[m_queue_write++].set(agg::path_cmd_line_to, m_maxX, m_maxY);
+
+            //if we clipped some segments between this line and the next line
+            //we are starting, we also need to move to the last point.
+            if (m_clipped) {
+                m_queue[m_queue_write++].set(agg::path_cmd_move_to, m_lastx, m_lasty);
+            }
+            else if (!m_lastMax)
+            {
+                //if the last line was not the longest line, then move back to
+                //the end point of the last line in the sequence. Only do this
+                //if not clipped, since in that case lastx,lasty is not part of
+                //the line just drawn.
+
+                //Would be move_to if not for the artifacts
+                m_queue[m_queue_write++].set(agg::path_cmd_line_to, m_lastx, m_lasty);
+            }
+
+            //now reset all the variables to get ready for the next line
+            m_origdx = *x - m_lastx;
+            m_origdy = *y - m_lasty;
+            m_origdNorm2 = m_origdx*m_origdx + m_origdy*m_origdy;
+
+            m_dnorm2Max = m_origdNorm2;
+            m_dnorm2Min = 0.0;
+            m_haveMin = false;
+            m_lastMax = true;
+            m_lastx = m_maxX = *x;
+            m_lasty = m_maxY = *y;
+            m_lastWrittenX = m_minX = m_lastx;
+            m_lastWrittenY = m_minY = m_lasty;
+
+            m_clipped = false;
+#if DEBUG_SIMPLIFY
+            m_pushed += m_queue_write - m_queue_read;
+#endif
+
+    }
+
 };
 
 #endif // __AGG_PY_PATH_ITERATOR_H__
Index: lib/matplotlib/path.py
===================================================================
--- lib/matplotlib/path.py	(revision 6166)
+++ lib/matplotlib/path.py	(working copy)
@@ -110,7 +110,7 @@
         assert vertices.shape[1] == 2
 
         self.should_simplify = (codes is None and
-                                np.all(np.isfinite(vertices)) and
+                                ## np.all(np.isfinite(vertices)) and
                                 len(vertices) >= 128)
         self.codes = codes
         self.vertices = vertices
Index: examples/pylab_examples/clippedline.py
===================================================================
--- examples/pylab_examples/clippedline.py	(revision 6166)
+++ examples/pylab_examples/clippedline.py	(working copy)
@@ -19,7 +19,7 @@
 
     def set_data(self, *args, **kwargs):
         Line2D.set_data(self, *args, **kwargs)
-        if self._invalid: 
+        if self._invalid:
             self.recache()
         self.xorig = np.array(self._x)
         self.yorig = np.array(self._y)
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Reply via email to