Re: [matplotlib-devel] RFC: boxplot_enhanced & paired_stats

Skipper Seabold Fri, 16 Nov 2012 07:59:15 -0800

On Fri, Nov 16, 2012 at 10:19 AM, Yaroslav Halchenko <s...@onerussian.com> 
wrote:
> I just found some code (http://www.onerussian.com/tmp/plots.py and
> pasted below for review/feedback) laying around which I wrote around
> matplotlib for plotting primarily pair-wise stats results.  Here is a
> demonstration:
> http://nbviewer.ipython.org/url/www.onerussian.com/tmp/run_plots.ipynb
>
> I wonder if there is a need/place for it in matplotlib and what changes would
> you advise. Sorry for the lack of documentation -- I guess I have not finished
> it at that point (scipy dependency can easily be dropped, used only for
> standard error function iirc):
>


Looks nice. We'd certainly be interesting in including it in
statsmodels/graphics if there isn't sufficient interest here and/or
you'd like to keep the scipy dependency. ;)

Skipper

> #!/usr/bin/python
> #emacs: -*- mode: python-mode; py-indent-offset: 4; tab-width: 4; 
> indent-tabs-mode: nil -*-
> #ex: set sts=4 ts=4 sw=4 noet:
> #------------------------- =+- Python script -+= -------------------------
> """
>  @file      paired-plots.py
>  @date      Fri Jan 13 11:48:00 2012
>  @brief
>
>
>   Yaroslav Halchenko                                            Dartmouth
>   web:     http://www.onerussian.com                              College
>   e-mail:  y...@onerussian.com                              ICQ#: 60653192
>
>  DESCRIPTION (NOTES):
>
>  COPYRIGHT: Yaroslav Halchenko 2012
>
>  LICENSE: MIT
>
>   Permission is hereby granted, free of charge, to any person obtaining a copy
>   of this software and associated documentation files (the "Software"), to 
> deal
>   in the Software without restriction, including without limitation the rights
>   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>   copies of the Software, and to permit persons to whom the Software is
>   furnished to do so, subject to the following conditions:
>
>   The above copyright notice and this permission notice shall be included in
>   all copies or substantial portions of the Software.
>
>   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
>   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>   THE SOFTWARE.
> """
> #-----------------\____________________________________/------------------
>
> __author__ = 'Yaroslav Halchenko'
> __revision__ = '$Revision: $'
> __date__ = '$Date:  $'
> __copyright__ = 'Copyright (c) 2012 Yaroslav Halchenko'
> __license__ = 'MIT'
>
>
> import numpy as np
> import pylab as pl
> import scipy.stats as ss
>
> def plot_boxplot_enhanced(
>     v,
>     contrast_labels=None,
>     condition_labels=None,
>     ccolors=['y', 'b'],
>     rand_offsets=None,
>     grid=True,
>     xticks_rotation=0,
>     **bp_kwargs):
>
>     width = bp_kwargs.get('width', 0.5)
>     pl.boxplot(v, **bp_kwargs)
>
>     if v.ndim < 2: v = v[:, None]
>     ncol = v.shape[1]
>
>     eff = np.mean(v, axis=0)          # effect sizes
>     sem = ss.sem(v, axis=0)
>
>     if rand_offsets is None:
>         rand_offsets = np.random.randn(len(v)) * 0.02
>
>     pl.plot((np.arange(ncol) + 1)[:, None] + rand_offsets,
>             v.T, '.', color='k', markerfacecolor='k')
>     for i in range(ncol):
>         lw = 2
>         pl.plot([1 - width/2. + i, 1+i],
>                 [0, 0],
>                 '--', color=ccolors[0], linewidth=lw) # first condition
>         pl.plot([1+i, 1 + width/2. +i],
>                 [eff[i]]*2,
>                 '--', color=ccolors[1], linewidth=lw)
>
>         # place ste
>         pl.errorbar(i+1 + 1.1*width/2.,
>                     eff[i],
>                     sem[i],
>                     elinewidth=2, linewidth=0,
>                     color='r', ecolor='r')
>
>         if contrast_labels and not i:                    # only for the first 
> one
>             pl.text(1 - 1.1*width/2 + i, 0.1, contrast_labels[0],
>                     verticalalignment='bottom',
>                     horizontalalignment='right')
>             pl.text(1 + 1.2*width/2 + i, eff[i], contrast_labels[1],
>                     verticalalignment='bottom', horizontalalignment='left')
>     ax = pl.gca()
>     if condition_labels:
>         ax.set_xticklabels(condition_labels, rotation=xticks_rotation)
>     else:
>         # hide it
>         ax.axes.xaxis.set_visible(False)
>
>     if grid:
>         ax.grid()
>     return ax
>
>
> def plot_paired_stats(
>     v0, v1, contrast_labels,
>     condition_labels=None,
>     style=['barplot_effect',
>            'boxplot_raw',
>            'boxplot_effect'],
>     ccolors=['y', 'g'],
>     xticks_rotation=0,
>     grid=False,
>     fig=None,
>     bottom_adjust=None,
>     bp_kwargs={}):
>
>     if isinstance(style, str):
>         style = [style]
>
>     nplots = len(style)                 # how many subplots will be needed
>
>     # assure having 2nd dimension
>     if v0.ndim < 2: v0 = v0[:, None]
>     if v1.ndim < 2: v1 = v1[:, None]
>     assert(v0.shape == v1.shape)
>
>     ncol = v0.shape[1]
>     v10 = (v1 - v0)                     # differences
>     mv0 = np.mean(v0, axis=0)           # means
>     mv1 = np.mean(v1, axis=0)
>
>     eff = np.mean(v10, axis=0)          # effect sizes
>     sem = ss.sem(v10, axis=0)
>
>     # so that data points have are distinguishable
>     rand_offsets = np.random.randn(len(v10)) * 0.02
>
>     # interleaved combination for some plots
>     v_ = np.hstack((v0, v1))
>     v = np.zeros(v_.shape, dtype=v_.dtype)
>     v[:, np.hstack((np.arange(0, ncol*2, 2),
>                     np.arange(1, ncol*2, 2)))] = v_
>
>     #print v.shape
>     #print np.mean(v0, axis=0), np.mean(v1, axis=0)
>     #print np.min(v10, axis=0), np.max(v10, axis=0), \
>     #      np.mean(v10, axis=0), ss.sem(v10, axis=0)
>     #pl.boxplot(v10 + np.mean(v1), notch=1, widths=0.05)
>
>     #print v0.shape, v1.shape, np.hstack([v0, v1]).shape
>
>     if fig is None:
>         fig = pl.figure()
>
>     bwidth = 0.5
>     plot = 1
>
>     if condition_labels:
>         xlabels = [ '%s:%s' % (cond, contr)
>                     for cond in condition_labels
>                     for contr in contrast_labels ]
>     else:
>         xlabels = contrast_labels
>
>     bp_kwargs_ = {
>         #'bootstrap': 0,
>         'notch' : 1
>         }
>     bp_kwargs_.update(bp_kwargs)
>
>     def plot_grid(ax):
>         if grid:
>             ax.grid()
>
>     if 'barplot_effect' in style:
>         if len(style) > 1:
>             pl.subplot(1, nplots, plot)
>         plot += 1
>         # The simplest one
>         pl.bar(np.arange(1, ncol*2+1) - bwidth/2,
>                np.mean(v, axis=0),
>                color=ccolors*ncol,
>                edgecolor=ccolors*ncol,
>                alpha=0.8,
>                width=bwidth)
>         #pl.minorticks_off()
>         pl.tick_params('x', direction='out', length=6, width=1,
>                        top=False)
>         ax = pl.gca()
>         pl.xlim(0.5, ncol*2+0.5)
>         ax.set_xticks(np.arange(1, ncol*2+1))
>         ax.set_xticklabels(xlabels, rotation=xticks_rotation)
>         # place ste for effect size into the 2nd column
>         pl.errorbar(np.arange(ncol)*2+2,
>                     mv1,
>                     sem, elinewidth=2, linewidth=0,
>                     color='g', ecolor='r')
>
>         plot_grid(ax)
>
>     if 'boxplot_raw' in style:
>         if len(style) > 1:
>             pl.subplot(1, nplots, plot)
>         plot += 1
>
>         # Figure 1 -- "raw" data
>         # plot "connections" between boxplots
>         for i in range(ncol):
>             pargs = (np.array([i*2+1, i*2+2])[:, None] + rand_offsets,
>                      np.array([v0[:,i], v1[:,i]]))
>             pl.plot(*(pargs+('-',)), color='k', alpha=0.5, linewidth=0.25)
>             pl.plot(*(pargs+('.',)), color='k', alpha=0.9)
>         # boxplot of "raw" data
>         bp1 = pl.boxplot(v, widths=bwidth, **bp_kwargs_)
>         for i in range(ncol):
>             for c in xrange(2):
>                 b = bp1['boxes'][2*i+c]
>                 b.set_color(ccolors[c])
>                 b.set_linewidth(2)
>
>         ax = pl.gca()
>         ax.set_xticklabels(xlabels, rotation=xticks_rotation)
>         plot_grid(ax)
>
>     if 'boxplot_effect' in style:
>         if len(style) > 1:
>             pl.subplot(1, nplots, plot)
>         plot += 1
>         plot_boxplot_enhanced(v10,
>                               contrast_labels=contrast_labels,
>                               condition_labels=condition_labels,
>                               widths=bwidth,
>                               rand_offsets=rand_offsets, # reuse them
>                               grid=grid,
>                               **bp_kwargs_)
>
>     if bottom_adjust:
>         fig.subplots_adjust(bottom=bottom_adjust)
>     pl.draw_if_interactive()
>     return fig
>
> if __name__ == '__main__':
>
>     if True:
>         v = np.random.normal(size=(50,8)) * 20 + 120
>         if False:
>             v[:, 1] += 40
>             v[:, 3] -= 30
>             v[:, 5] += 60
>             v[:, 6] -= 60
>         else:
>             v -= np.arange(v.shape[1])*10
>         v /= 10
>
>     v0 = v[:, ::2]
>     v1 = v[:, 1::2]
>     d = v1 - v0
>     print np.mean(d, axis=0)
>     styles = ['barplot_effect',
>               'boxplot_raw',
>               'boxplot_effect'
>               ]
>     styles = styles + [styles]
>     pl.close('all')
>
>     if False:
>         f = plot_boxplot_enhanced((v1-v0)[:,0],
>                                   grid=True, xticks_rotation=30, notch=1)
>
>     for s in styles:
>         fig = pl.figure(figsize=(12,6))
>         f = plot_paired_stats(v0, v1, ['cont1', 'cont2'],
>                               style=s, fig=fig,
>                               condition_labels=['exp1', 'exp2', 'exp3', 
> 'exp4'],
>                               grid=True, xticks_rotation=30)
>     pl.show()
>
>
> --
> Yaroslav O. Halchenko
> Postdoctoral Fellow,   Department of Psychological and Brain Sciences
> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
> WWW:   http://www.linkedin.com/in/yarik
>
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or applications!
> http://p.sf.net/sfu/zoho_dev2dev_nov
> _______________________________________________
> Matplotlib-devel mailing list
> Matplotlib-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Re: [matplotlib-devel] RFC: boxplot_enhanced & paired_stats

Reply via email to