2011/1/1 OKB (not okblacke) <brenb...@brenbarn.net>:
>        I noticed that the boxplot function incorrectly calculates the
> location of the median line in each box.  As a simple example, plotting
> the dataset [1, 2, 3, 4] incorrectly plots the median line at 3.

It seems to work fine in matplotlib 1.0.0:

u...@host:~$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib as mpl
>>> mpl.__version__
'1.0.0'
>>> import matplotlib.pyplot as plt
>>> import matplotlib.mlab as mlab
>>> plt.ion()
>>> plt.boxplot([1, 2, 3, 4])
{'medians': [<matplotlib.lines.Line2D object at 0x3ad6250>], 'fliers':
[<matplotlib.lines.Line2D object at 0x3ad6610>,
<matplotlib.lines.Line2D object at 0x3ad69d0>], 'whiskers':
[<matplotlib.lines.Line2D object at 0x3acff50>,
<matplotlib.lines.Line2D object at 0x3ad4310>], 'boxes':
[<matplotlib.lines.Line2D object at 0x3ad4e50>], 'caps':
[<matplotlib.lines.Line2D object at 0x3ad46d0>,
<matplotlib.lines.Line2D object at 0x3ad4a90>]}
>>> plt.grid()
>>> plt.boxplot([1, 2, 3, 4])
{'medians': [<matplotlib.lines.Line2D object at 0x3dfbad0>], 'fliers':
[<matplotlib.lines.Line2D object at 0x3dfbe90>,
<matplotlib.lines.Line2D object at 0x3dff290>], 'whiskers':
[<matplotlib.lines.Line2D object at 0x3df8810>,
<matplotlib.lines.Line2D object at 0x3df8b90>], 'boxes':
[<matplotlib.lines.Line2D object at 0x3dfb710>], 'caps':
[<matplotlib.lines.Line2D object at 0x3df8f50>,
<matplotlib.lines.Line2D object at 0x3dfb350>]}
>>> plt.grid()
>>> # See attached image.
...
>>> mlab.prctile([1, 2, 3, 4])
array([ 1.  ,  1.75,  2.5 ,  3.25,  4.  ])

Goyo

>
>        It also seems that the quartile calculations for the box are a
> little peculiar.  I have seen some discussion in old mailing list
> postings about mlab.prctile and its ways of calculating percentiles,
> which are different than those of some other software.
>
>        I'm aware that there is legitimate disagreement about the "best"
> way to calculate the quartiles.  However, it seems to me that mlab's way
> is still not any of these possibly-correct ways, because it uses int()
> or nparray.astype(int) to coerce the percentile result to an integer
> index.  This TRUNCATES the floating-point result.  No accepted quantile-
> calculating method that I'm aware of does this; they all ROUND instead
> of truncating (if they want to coerce to an integer index at all, in
> order to produce a quantile value that is an element of the data set),
> or in some cases they round uniformly up for the lower quartile and
> down for the upper.  You can see a summary of different methods at
> http://www.amstat.org/publications/jse/v14n3/langford.html ; the method
> used by mlab does not appear to agree with any of these.
>
>        I would suggest that mlab.prctile be fixed to conform to some one
> or other of these methods, rather than adding to the proliferation of
> approaches to quantile-calculation.  Is there any motivation for always
> truncating to integer (other that "it's quicker to type" :-)?
>
>        Also, regardless of these quartile issues, there is, as far as I'm
> aware, no one who denies that the median of a (sorted) data set with an
> even number of values is the mean of the middle two values.  Since numpy
> is already a dependency for matplotlib, boxplot shouldn't use
> mlab.prctile at all to decide where to plot the median line -- just use
> numpy.median.
>
> Thanks,
> --
> --OKB (not okblacke)
> Brendan Barnwell
> "Do not follow where the path may lead.  Go, instead, where there is
> no path, and leave a trail."
>        --author unknown

<<attachment: boxplot_sample.png>>

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to