On Tue, Oct 27, 2009 at 7:56 AM, Gökhan Sever <gokhanse...@gmail.com> wrote: > Hello, > > Consider this sample two columns of data: > > 999999.9999 999999.9999 > 999999.9999 999999.9999 > 999999.9999 999999.9999 > 999999.9999 1693.9069 > 999999.9999 1676.1059 > 999999.9999 1621.5875 > 651.8040 1542.1373 > 691.0138 1650.4214 > 678.5558 1710.7311 > 621.5777 999999.9999 > 644.8341 999999.9999 > 696.2080 999999.9999 > > Putting into this data into a file say "sample.data" and loading with: > > a,b = np.loadtxt('sample.data', dtype="float").T > > I[16]: a > O[16]: > array([ 1.00000000e+06, 1.00000000e+06, 1.00000000e+06, > 1.00000000e+06, 1.00000000e+06, 1.00000000e+06, > 6.51804000e+02, 6.91013800e+02, 6.78555800e+02, > 6.21577700e+02, 6.44834100e+02, 6.96208000e+02]) > > I[17]: b > O[17]: > array([ 999999.9999, 999999.9999, 999999.9999, 1693.9069, > 1676.1059, 1621.5875, 1542.1373, 1650.4214, > 1710.7311, 999999.9999, 999999.9999, 999999.9999]) > > ### interestingly, the second column is loaded as it is but a values > reformed a little. Why this could be happening? Any idea? Anyways, back to > masked arrays: > > I[24]: am = ma.masked_values(a, value=999999.9999) > > I[25]: am > O[25]: > masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777 > 644.8341 696.208], > mask = [ True True True True True True False False False > False False False], > fill_value = 999999.9999) > > > I[30]: bm = ma.masked_values(b, value=999999.9999) > > I[31]: am > O[31]: > masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777 > 644.8341 696.208], > mask = [ True True True True True True False False False > False False False], > fill_value = 999999.9999) > > > So far so good. A few basic checks: > > I[33]: am/bm > O[33]: > masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712 > 0.39664667346 -- -- --], > mask = [ True True True True True True False False False > True True True], > fill_value = 999999.9999) > > > I[34]: mean(am/bm) > O[34]: 0.41266624676580849 > > Unfortunately, matplotlib.mlab's prctile cannot handle this division: > > I[54]: prctile(am/bm, p=[5,25,50,75,95]) > O[54]: > array([ 3.96646673e-01, 6.21577700e+02, 1.00000000e+06, > 1.00000000e+06, 1.00000000e+06]) > > > This also results with wrong looking box-and-whisker plots. > > > Testing further with scipy.stats functions yields expected correct results:
This should not be the correct results if you use scipy.stats.scoreatpercentile, it doesn't have correct missing value handling, it treats nans or mask/fill values as regular numbers sorted to the end. stats.mstats.scoreatpercentile is the corresponding function for masked arrays. (BTW I wasn't able to quickly copy and past your example because MaskedArrays don't seem to have a constructive __repr__, i.e. no commas) I don't know anything about the matplotlib story. Josef > > I[55]: stats.scoreatpercentile(am/bm, per=5) > O[55]: 0.40877012449846228 > > I[49]: stats.scoreatpercentile(am/bm, per=25) > O[49]: > masked_array(data = --, > mask = True, > fill_value = 1e+20) > > I[56]: stats.scoreatpercentile(am/bm, per=95) > O[56]: > masked_array(data = --, > mask = True, > fill_value = 1e+20) > > > Any confirmation? > > > > > > > > -- > Gökhan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion