Re: [matplotlib-devel] path simplification can decrease the smoothness of data plots a

2009-02-23 Thread Michael Droettboom
Thanks for the note on this.  It's nice to know who wrote the original 
version. I'll add a note about this in the code comments.

I'm not seeing a noticable change in this regard between 0.98.5 (which 
uses a pretty direct refactoring of your code) to the SVN trunk.  The 
trunk does two things rather differently 1) it only ever returns points 
that exist in the original data, and 2) it clips line segments at the 
boundary of the plot.  The latter is to get around a shortcoming of Agg 
(and Abode Reader, for that matter) when plotting lines to very 
high-valued coordinates.

But, I'd appreciate you having a comparison look yourself, in case 
you're seeing some detail that I'm missing.

Cheers,
Mike

Allan Haldane wrote:
 a writes:
   
 Michael Droettboom md...@... writes:
 
 Thanks for the pointers.

 The original simplification code was written by John Hunter (I believe),
 and I don't know if it was designed by him also or is a replication of
 something published elsewhere.  So I take no credit for and have little
 knowledge of its original goals.
   
 I'm not sure on everything it does, but it seems to do clipping and removes
 line segments where the change in slope is less than some limit. There are
 probably better algorithms out there, but this one works surprisingly well
 and is fast and simple. I think it should be a requirement that it returns
 points which are a subset of the original points- with the change you've
 made it does this, right?
 

 Oh Hey! I'm the one who originally wrote the path simplification code. I'd
 have thought it would be gone by now, but I am very happy it turned out to
 be useful. I made it up in order to plot a very large set of noisy data I
 had.

 The goal was to simplify two types of plots at once: Smooth curves, as
 well as very noisy data where many lines are 'on top' of each other. (eg
 plot(rand(10)) ). I noticed both could be taken care of by checking
 for changes in slope.

 An important goal (for me) was making sure that the min/max span of the
 points plotted was preserved. (so that eg plot(rand(1000)) spans from the
 lowest to highest point in the data (ie ~ 0 to 1) for any zoom factor).
 I'm not sure if this property survived...: If you do plot(rand(1000)) with
 the latest matplotlib and gradually zoom out on the x axis, you can see
 the top/bottom tips of the plotted line flickering in height, which is
 what I was trying to avoid. I forget whether I actually got it as I wanted
 it though, maybe I gave up.

 Allan




 --
 Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
 -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
 -Strategies to boost innovation and cut costs with open source participation
 -Receive a $600 discount off the registration fee with the source code: SFAD
 http://p.sf.net/sfu/XcvMzF8H
 ___
 Matplotlib-devel mailing list
 Matplotlib-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] path simplification can decrease the smoothness of data plots a

2009-02-19 Thread Allan Haldane
a writes:
 Michael Droettboom md...@... writes:
 
  Thanks for the pointers.
 
  The original simplification code was written by John Hunter (I believe),
  and I don't know if it was designed by him also or is a replication of
  something published elsewhere.  So I take no credit for and have little
  knowledge of its original goals.

 I'm not sure on everything it does, but it seems to do clipping and removes
 line segments where the change in slope is less than some limit. There are
 probably better algorithms out there, but this one works surprisingly well
 and is fast and simple. I think it should be a requirement that it returns
 points which are a subset of the original points- with the change you've
 made it does this, right?

Oh Hey! I'm the one who originally wrote the path simplification code. I'd
have thought it would be gone by now, but I am very happy it turned out to
be useful. I made it up in order to plot a very large set of noisy data I
had.

The goal was to simplify two types of plots at once: Smooth curves, as
well as very noisy data where many lines are 'on top' of each other. (eg
plot(rand(10)) ). I noticed both could be taken care of by checking
for changes in slope.

An important goal (for me) was making sure that the min/max span of the
points plotted was preserved. (so that eg plot(rand(1000)) spans from the
lowest to highest point in the data (ie ~ 0 to 1) for any zoom factor).
I'm not sure if this property survived...: If you do plot(rand(1000)) with
the latest matplotlib and gradually zoom out on the x axis, you can see
the top/bottom tips of the plotted line flickering in height, which is
what I was trying to avoid. I forget whether I actually got it as I wanted
it though, maybe I gave up.

Allan




--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] path simplification can decrease the smoothness of data plots

2009-01-21 Thread Michael Droettboom
I've checked this change into SVN so others can test it out.

Assuming we don't discover any cases where this is clearly inferior, it 
should make it into the next major release.

Mike

Andrew Hawryluk wrote:
 -Original Message-
 From: Michael Droettboom [mailto:md...@stsci.edu]
 Sent: 16 Jan 2009 1:31 PM
 To: Andrew Hawryluk
 Cc: matplotlib-devel@lists.sourceforge.net
 Subject: Re: [matplotlib-devel] path simplification can decrease the
 smoothness of data plots

 Michael Droettboom wrote:
 

 ...

   
 I've attached a patch that will only include points from the original
 data in the simplified path.  I hesitate to commit it to SVN, as these
 things are very hard to get right -- and just because it appears to
 work better on this data doesn't mean it doesn't create a regression
 
 on
   
 something else... ;)  That said, it would be nice to confirm that this
 solution works, because it has the added benefit of being a little
 simpler computationally.  Be sure to blitz your build directory when
 testing the patch -- distutils won't pick it up as a dependency.

 I've attached two PDFs -- one with the original (current trunk)
 behavior, and one with the new behavior.  I plotted the unsimplified
 plot in thick blue behind the simplified plot in green, so you can see
 how much deviation there is between the original data and the
 simplified line (you'll want to zoom way in with your PDF viewer to
 
 see
   
 it.)

 I've also included a new version of your test script which detects
 new
 data values in the simplified path, and also seeds the random number
 generator so that results are comparable.  I also set the
 solid_joinstyle to round, as it makes the wiggliness less
 
 pronounced.
   
 (There was another thread on this list recently about making that the
 default setting).

 Cheers,
 Mike

 --
 Michael Droettboom
 Science Software Branch
 Operations and Engineering Division
 Space Telescope Science Institute
 Operated by AURA for NASA
 

 Thanks for looking into this! The new plot is much improved, and the
 simplified calculations are a pleasant surprise. I was also testing the
 previous algorithm with solid_joinstyle set to round as it is the
 default in my matplotlibrc.

 I am probably not able to build your patch here, unless building
 matplotlib from source on Windows is easier than I anticipate. May I
 send you some data off the list for you to test?

 Regards,
 Andrew

 NOVA Chemicals Research  Technology Centre
 Calgary, Canada
   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] path simplification can decrease the smoothness of data plots

2009-01-21 Thread a


Michael Droettboom md...@... writes:

 
 I've checked this change into SVN so others can test it out.
 
 Assuming we don't discover any cases where this is clearly inferior, it 
 should make it into the next major release.
 
 Mike
 

Hi,

This change looks good- it has the advantage of choosing points that actually
lie on the curve, which is better visually, and would seem to be a better
solution for publication quality plots.

The method for simplifying the paths is quite simple and effective, but a bit
crude- there are other algorithms you might look into for simplifying lines:

  http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm

This one is fairly simple to implement and has the advantage that you have some
control over the errors- the deviation from your simplified path and the actual
path.

Also, you might consider to make the path simplification tolerance (perdNorm2)
an adjustable parameter in the matplotlibrc file:

  #src/agg_py_path_iterator.h

//if the perp vector is less than some number of (squared)
//pixels in size, then merge the current vector
if (perpdNorm2  (1.0 / 9.0))


kind regards,

a






--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] path simplification can decrease the smoothness of data plots

2009-01-21 Thread Michael Droettboom
a wrote:
 Michael Droettboom md...@... writes:

   
 I've checked this change into SVN so others can test it out.

 Assuming we don't discover any cases where this is clearly inferior, it 
 should make it into the next major release.

 Mike

 

 Hi,

 This change looks good- it has the advantage of choosing points that actually
 lie on the curve, which is better visually, and would seem to be a better
 solution for publication quality plots.

 The method for simplifying the paths is quite simple and effective, but a bit
 crude- there are other algorithms you might look into for simplifying lines:

   http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm

 This one is fairly simple to implement and has the advantage that you have 
 some
 control over the errors- the deviation from your simplified path and the 
 actual
 path.
   
Thanks for the pointers.

The original simplification code was written by John Hunter (I believe), 
and I don't know if it was designed by him also or is a replication of 
something published elsewhere.  So I take no credit for and have little 
knowledge of its original goals.

However, IMHO the primary purpose of the path simplification in 
matplotlib is to improve interactive performance (and smaller file size 
is just an convenient side effect of that), I would hesitate to use an 
algorithm that is any worse than O(n), since it must be recalculated on 
every pan or zoom since the simplification is related to *pixels* not 
data units.  Even on modern hardware, it is a constant battle keeping 
the inner drawing loop fast enough.  We could, of course, make the 
choice of algorithm user-configurable, or use something more precise 
when using a non-interactive backend, but then we would have two 
separate code paths to keep in sync and bug free --- not a choice I take 
lightly.

The trick with the present algorithm is to keep the error rate at the 
subpixel level through the correct selection of perpdNorm.  It seems to 
me that the more advanced simplification algorithm is only necessary 
when you want to simplify more aggressively than the pixel level.  But 
what hasn't been done is a proper study of the error rate along the 
simplified path of the current approach vs. other possible approaches.  
Even this latest change was verified by just looking at the results 
which seemingly are better on the data I looked at.  So I'm mostly 
speaking from my gut rather than evidence here.
 Also, you might consider to make the path simplification tolerance (perdNorm2)
 an adjustable parameter in the matplotlibrc file:

   #src/agg_py_path_iterator.h

 //if the perp vector is less than some number of (squared)
 //pixels in size, then merge the current vector
 if (perpdNorm2  (1.0 / 9.0))
   
That sounds like a good idea.  I'll have a look at doing that.

Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] path simplification can decrease the smoothness of data plots

2009-01-21 Thread a
Michael Droettboom md...@... writes:




 Thanks for the pointers.
 
 The original simplification code was written by John Hunter (I believe), 
 and I don't know if it was designed by him also or is a replication of 
 something published elsewhere.  So I take no credit for and have little 
 knowledge of its original goals.

I'm not sure on everything it does, but it seems to do clipping and removes 
line segments where the change in slope is less than some limit. There are
probably better algorithms out there, but this one works surprisingly well 
and is fast and simple. I think it should be a requirement that it returns
points which are a subset of the original points- with the change you've 
made it does this, right? 

 
 However, IMHO the primary purpose of the path simplification in 
 matplotlib is to improve interactive performance (and smaller file size 
 is just an convenient side effect of that), I would hesitate to use an 
 algorithm that is any worse than O(n), since it must be recalculated on 
 every pan or zoom since the simplification is related to *pixels* not 
 data units.  Even on modern hardware, it is a constant battle keeping 
 the inner drawing loop fast enough.  We could, of course, make the 
 choice of algorithm user-configurable, or use something more precise 
 when using a non-interactive backend, but then we would have two 
 separate code paths to keep in sync and bug free --- not a choice I 
 take lightly.

I see your point.

I originally encountered a problem when preparing a pdf figure- I had a lot 
of high resolution data, and with path simplification the resulting pdf 
looked pretty bad (the lines were jagged). But the advantage was a massive
reduction in file size of the pdf. I adjusted perpdNorm2 and got much better
results.


 
 The trick with the present algorithm is to keep the error rate at the 
 subpixel level through the correct selection of perpdNorm.  It seems to 
 me that the more advanced simplification algorithm is only necessary 
 when you want to simplify more aggressively than the pixel level.  But 
 what hasn't been done is a proper study of the error rate along the 
 simplified path of the current approach vs. other possible approaches.  
 Even this latest change was verified by just looking at the results 
 which seemingly are better on the data I looked at.  So I'm mostly 
 speaking from my gut rather than evidence here.
 
#src/agg_py_path_iterator.h
 
  //if the perp vector is less than some number of (squared)
  //pixels in size, then merge the current vector
  if (perpdNorm2  (1.0 / 9.0))

 That sounds like a good idea.  I'll have a look at doing that.
 

Right, perhaps the best thing to do is make the tolerance parameter 
adjustable, so it can be reduced to speed up drawing in the interactive
backends, but it can also be easily bumped up for extra resolution in the
non-interactive backends like pdf/ps.


 Mike
 

a



--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] path simplification can decrease the smoothness of data plots

2009-01-20 Thread Michael Droettboom

 Thanks for looking into this! The new plot is much improved, and the
 simplified calculations are a pleasant surprise. I was also testing the
 previous algorithm with solid_joinstyle set to round as it is the
 default in my matplotlibrc.

 I am probably not able to build your patch here, unless building
 matplotlib from source on Windows is easier than I anticipate. May I
 send you some data off the list for you to test?
   
No problem.  I'd also want testing from others -- there aren't a lot of 
examples in matplotlib itself where simplification even kicks in.

Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] path simplification can decrease the smoothness of data plots

2009-01-19 Thread Andrew Hawryluk
 -Original Message-
 From: Michael Droettboom [mailto:md...@stsci.edu]
 Sent: 16 Jan 2009 1:31 PM
 To: Andrew Hawryluk
 Cc: matplotlib-devel@lists.sourceforge.net
 Subject: Re: [matplotlib-devel] path simplification can decrease the
 smoothness of data plots
 
 Michael Droettboom wrote:

...

 I've attached a patch that will only include points from the original
 data in the simplified path.  I hesitate to commit it to SVN, as these
 things are very hard to get right -- and just because it appears to
 work better on this data doesn't mean it doesn't create a regression
on
 something else... ;)  That said, it would be nice to confirm that this
 solution works, because it has the added benefit of being a little
 simpler computationally.  Be sure to blitz your build directory when
 testing the patch -- distutils won't pick it up as a dependency.
 
 I've attached two PDFs -- one with the original (current trunk)
 behavior, and one with the new behavior.  I plotted the unsimplified
 plot in thick blue behind the simplified plot in green, so you can see
 how much deviation there is between the original data and the
 simplified line (you'll want to zoom way in with your PDF viewer to
see
 it.)
 
 I've also included a new version of your test script which detects
 new
 data values in the simplified path, and also seeds the random number
 generator so that results are comparable.  I also set the
 solid_joinstyle to round, as it makes the wiggliness less
pronounced.
 (There was another thread on this list recently about making that the
 default setting).
 
 Cheers,
 Mike
 
 --
 Michael Droettboom
 Science Software Branch
 Operations and Engineering Division
 Space Telescope Science Institute
 Operated by AURA for NASA

Thanks for looking into this! The new plot is much improved, and the
simplified calculations are a pleasant surprise. I was also testing the
previous algorithm with solid_joinstyle set to round as it is the
default in my matplotlibrc.

I am probably not able to build your patch here, unless building
matplotlib from source on Windows is easier than I anticipate. May I
send you some data off the list for you to test?

Regards,
Andrew

NOVA Chemicals Research  Technology Centre
Calgary, Canada

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


[matplotlib-devel] path simplification can decrease the smoothness of data plots

2009-01-14 Thread Andrew Hawryluk
I'm really excited about the new path simplification option for vector
output formats. I tried it the first time yesterday and reduced a PDF
from 231 kB to 47 kB. Thanks very much for providing this feature!

However, I have noticed that the simplified paths often look more jagged
than the original, at least for my data. I can recreate the effect with
the following:

[start]
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-3,3,0.001)
y = np.exp(-x**2) + np.random.normal(scale=0.001,size=x.size)
plt.plot(x,y)
plt.savefig('test.png')
plt.savefig('test.pdf')
[end]

A sample output is attached, and close inspection shows that the PNG is
a smooth curve with a small amount of noise while the PDF version has
very noticeable changes in direction from one line segment to the next.

 test.png  test.pdf 
The simplification algorithm (agg_py_path_iterator.h) does the
following:
   If line2 is nearly parallel to line1, add the parallel component to
the length of line1, leaving it direction unchanged
which results in a new data point, not contained in the original data.
Line1 will continue to be lengthened until it has deviated from the data
curve enough that the next true data point is considered non-parallel.
The cycle then continues. The result is a line that wanders around the
data curve, and only the first point is guaranteed to have existed in
the original data set. 

Instead, could the simplification algorithm do:
   If line2 is nearly parallel to line1, combine them by removing the
common point, leaving a single line where both end points existed in the
original data

Thanks again,
Andrew Hawryluk
attachment: test.png

test.pdf
Description: test.pdf
--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel