On Tue, Oct 27, 2009 at 7:56 AM, Gökhan Sever <gokhanse...@gmail.com> wrote: > Hello, > > Consider this sample two columns of data: > > 999999.9999 999999.9999 > 999999.9999 999999.9999 > 999999.9999 999999.9999 > 999999.9999 1693.9069 > 999999.9999 1676.1059 > 999999.9999 1621.5875 > 651.8040 1542.1373 > 691.0138 1650.4214 > 678.5558 1710.7311 > 621.5777 999999.9999 > 644.8341 999999.9999 > 696.2080 999999.9999 > > Putting into this data into a file say "sample.data" and loading with: > > a,b = np.loadtxt('sample.data', dtype="float").T > > I[16]: a > O[16]: > array([ 1.00000000e+06, 1.00000000e+06, 1.00000000e+06, > 1.00000000e+06, 1.00000000e+06, 1.00000000e+06, > 6.51804000e+02, 6.91013800e+02, 6.78555800e+02, > 6.21577700e+02, 6.44834100e+02, 6.96208000e+02]) > > I[17]: b > O[17]: > array([ 999999.9999, 999999.9999, 999999.9999, 1693.9069, > 1676.1059, 1621.5875, 1542.1373, 1650.4214, > 1710.7311, 999999.9999, 999999.9999, 999999.9999]) > > ### interestingly, the second column is loaded as it is but a values > reformed a little. Why this could be happening? Any idea? Anyways, back to > masked arrays: > > I[24]: am = ma.masked_values(a, value=999999.9999) > > I[25]: am > O[25]: > masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777 > 644.8341 696.208], > mask = [ True True True True True True False False False > False False False], > fill_value = 999999.9999) > > > I[30]: bm = ma.masked_values(b, value=999999.9999) > > I[31]: am > O[31]: > masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777 > 644.8341 696.208], > mask = [ True True True True True True False False False > False False False], > fill_value = 999999.9999) > > > So far so good. A few basic checks: > > I[33]: am/bm > O[33]: > masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712 > 0.39664667346 -- -- --], > mask = [ True True True True True True False False False > True True True], > fill_value = 999999.9999) > > > I[34]: mean(am/bm) > O[34]: 0.41266624676580849 > > Unfortunately, matplotlib.mlab's prctile cannot handle this division: > > I[54]: prctile(am/bm, p=[5,25,50,75,95]) > O[54]: > array([ 3.96646673e-01, 6.21577700e+02, 1.00000000e+06, > 1.00000000e+06, 1.00000000e+06]) > > > This also results with wrong looking box-and-whisker plots. > > > Testing further with scipy.stats functions yields expected correct results:
This should not be the correct results if you use scipy.stats.scoreatpercentile, it doesn't have correct missing value handling, it treats nans or mask/fill values as regular numbers sorted to the end. stats.mstats.scoreatpercentile is the corresponding function for masked arrays. (BTW I wasn't able to quickly copy and past your example because MaskedArrays don't seem to have a constructive __repr__, i.e. no commas) I don't know anything about the matplotlib story. Josef > > I[55]: stats.scoreatpercentile(am/bm, per=5) > O[55]: 0.40877012449846228 > > I[49]: stats.scoreatpercentile(am/bm, per=25) > O[49]: > masked_array(data = --, > mask = True, > fill_value = 1e+20) > > I[56]: stats.scoreatpercentile(am/bm, per=95) > O[56]: > masked_array(data = --, > mask = True, > fill_value = 1e+20) > > > Any confirmation? > > > > > > > > -- > Gökhan > > _______________________________________________ > NumPy-Discussion mailing list > numpy-discuss...@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users