2011/1/1 OKB (not okblacke) <brenb...@brenbarn.net>: > I noticed that the boxplot function incorrectly calculates the > location of the median line in each box. As a simple example, plotting > the dataset [1, 2, 3, 4] incorrectly plots the median line at 3.
It seems to work fine in matplotlib 1.0.0: u...@host:~$ python Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import matplotlib as mpl >>> mpl.__version__ '1.0.0' >>> import matplotlib.pyplot as plt >>> import matplotlib.mlab as mlab >>> plt.ion() >>> plt.boxplot([1, 2, 3, 4]) {'medians': [<matplotlib.lines.Line2D object at 0x3ad6250>], 'fliers': [<matplotlib.lines.Line2D object at 0x3ad6610>, <matplotlib.lines.Line2D object at 0x3ad69d0>], 'whiskers': [<matplotlib.lines.Line2D object at 0x3acff50>, <matplotlib.lines.Line2D object at 0x3ad4310>], 'boxes': [<matplotlib.lines.Line2D object at 0x3ad4e50>], 'caps': [<matplotlib.lines.Line2D object at 0x3ad46d0>, <matplotlib.lines.Line2D object at 0x3ad4a90>]} >>> plt.grid() >>> plt.boxplot([1, 2, 3, 4]) {'medians': [<matplotlib.lines.Line2D object at 0x3dfbad0>], 'fliers': [<matplotlib.lines.Line2D object at 0x3dfbe90>, <matplotlib.lines.Line2D object at 0x3dff290>], 'whiskers': [<matplotlib.lines.Line2D object at 0x3df8810>, <matplotlib.lines.Line2D object at 0x3df8b90>], 'boxes': [<matplotlib.lines.Line2D object at 0x3dfb710>], 'caps': [<matplotlib.lines.Line2D object at 0x3df8f50>, <matplotlib.lines.Line2D object at 0x3dfb350>]} >>> plt.grid() >>> # See attached image. ... >>> mlab.prctile([1, 2, 3, 4]) array([ 1. , 1.75, 2.5 , 3.25, 4. ]) Goyo > > It also seems that the quartile calculations for the box are a > little peculiar. I have seen some discussion in old mailing list > postings about mlab.prctile and its ways of calculating percentiles, > which are different than those of some other software. > > I'm aware that there is legitimate disagreement about the "best" > way to calculate the quartiles. However, it seems to me that mlab's way > is still not any of these possibly-correct ways, because it uses int() > or nparray.astype(int) to coerce the percentile result to an integer > index. This TRUNCATES the floating-point result. No accepted quantile- > calculating method that I'm aware of does this; they all ROUND instead > of truncating (if they want to coerce to an integer index at all, in > order to produce a quantile value that is an element of the data set), > or in some cases they round uniformly up for the lower quartile and > down for the upper. You can see a summary of different methods at > http://www.amstat.org/publications/jse/v14n3/langford.html ; the method > used by mlab does not appear to agree with any of these. > > I would suggest that mlab.prctile be fixed to conform to some one > or other of these methods, rather than adding to the proliferation of > approaches to quantile-calculation. Is there any motivation for always > truncating to integer (other that "it's quicker to type" :-)? > > Also, regardless of these quartile issues, there is, as far as I'm > aware, no one who denies that the median of a (sorted) data set with an > even number of values is the mean of the middle two values. Since numpy > is already a dependency for matplotlib, boxplot shouldn't use > mlab.prctile at all to decide where to plot the median line -- just use > numpy.median. > > Thanks, > -- > --OKB (not okblacke) > Brendan Barnwell > "Do not follow where the path may lead. Go, instead, where there is > no path, and leave a trail." > --author unknown
<<attachment: boxplot_sample.png>>
------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users