On Tue, Aug 21, 2012 at 8:56 AM, Virgil Stokes <v...@it.uu.se> wrote:
> On 21-Aug-2012 17:50, Paul Hobson wrote:
>>
>> On Tue, Aug 21, 2012 at 7:58 AM, Virgil Stokes <v...@it.uu.se> wrote:
>>>
>>> In reference to my previous email.
>>>
>>> How can I find the outliers (samples points beyond the whiskers) in the
>>> data
>>> used for the boxplot?
>>>
>>> Here is a code snippet that shows how it was used for the timings data (a
>>> list
>>> of 4 sublists (y1,y2,y3,y4), each containing 400,000 real data values),
>>>     ...
>>>     ...
>>>     ...
>>>     # Box Plots
>>>     plt.subplot(2,1,2)
>>>     timings = [y1,y2,y3,y4]
>>>     pos = np.array(range(len(timings)))+1
>>>     bp = plt.boxplot( timings, sym='k+', patch_artist=True,
>>>                      positions=pos, notch=1, bootstrap=5000 )
>>>
>>>     plt.xlabel('Algorithm')
>>>     plt.ylabel('Exection time (sec)')
>>>     plt.ylim(0.9*ymin,1.1*ymax)
>>>
>>>     plt.setp(bp['whiskers'], color='k',  linestyle='-' )
>>>     plt.setp(bp['fliers'], markersize=3.0)
>>>     plt.title('Box plots (%4d trials)' %(n))
>>>     plt.show()
>>>     ...
>>>     ...
>>>     ...
>>>
>>> Again my questions:
>>> 1) How to get the value of the median?
>>> 2) How to find the outliers (outside the whiskers)?
>>> 3) How to find the width of the notch?
>>
>> Virgil, the objects stuffed inside the `bp` dictionary should have
>> methods to retrieve their values. Let's see:
>>
>> In [35]: x = np.random.lognormal(mean=1.25, sigma=1.35, size=(37,3))
>>
>> In [36]: bp = plt.boxplot(x, bootstrap=5000, notch=True)
>>
>> In [37]: # Question 1
>>      ...: print('medians')
>>      ...: for n, median in enumerate(bp['medians']):
>>      ...:     print('%d: %f' % (n, median.get_ydata()[0]))
>>      ...:
>> medians
>> 0: 6.339692
>> 1: 3.449320
>> 2: 4.503706
>>
>> In [38]: # Question 2
>>      ...: print('fliers')
>>      ...: for n in range(0, len(bp['fliers']), 2):
>>      ...:     print('%d: upper outliers = \t' % (n/2,))
>>      ...:     print(bp['fliers'][n].get_ydata())
>>      ...:     print('\n%d: lower outliers = \t' % (n/2,))
>>      ...:     print(bp['fliers'][n+1].get_ydata())
>>      ...:     print('\n')
>>      ...:
>
> You had no outliers!
>
>>
>> In [39]: # Question 3
>>      ...: print('Confidence Intervals')
>>      ...: for n, box in enumerate(bp['boxes']):
>>      ...:     print('%d: lower CI: %f' % (n, box.get_ydata()[2]))
>>      ...:     print('%d: upper CI: %f' % (n, box.get_ydata()[4]))
>>      ...:
>> Confidence Intervals
>> 0: lower CI: 1.760701
>> 0: upper CI: 10.102221
>> 1: lower CI: 1.626386
>> 1: upper CI: 5.601927
>> 2: lower CI: 2.173173
>>
>> Hope that helps,
>> -paul
>
> Just what I was looking for Paul! Thanks very much.
>
> One final question --- Where can I find the documentation that answers my
> questions and gives more details about the equations used for the width of
> notch. etc.?
>
> Thanks again :-)

That should all be in the boxplot docstring. Do you use ipython? If
not, you should :)

if so, just do `plt.boxplot?` at the ipython terminal and it'll show up.
-paul

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to