On Fri, Jun 24, 2011 at 6:11 PM, Wes McKinney <[email protected]> wrote:
> On Fri, Jun 24, 2011 at 8:02 PM, Charles R Harris > <[email protected]> wrote: > > > > > > On Fri, Jun 24, 2011 at 5:22 PM, Wes McKinney <[email protected]> > wrote: > >> > >> On Fri, Jun 24, 2011 at 7:10 PM, Charles R Harris > >> <[email protected]> wrote: > >> > > >> > > >> > On Fri, Jun 24, 2011 at 4:21 PM, Matthew Brett < > [email protected]> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Fri, Jun 24, 2011 at 10:09 PM, Benjamin Root <[email protected]> > >> >> wrote: > >> >> ... > >> >> > Again, there are pros and cons either way and I see them very > >> >> > orthogonal > >> >> > and > >> >> > complementary. > >> >> > >> >> That may be true, but I imagine only one of them will be implemented. > >> >> > >> >> @Mark - I don't have a clear idea whether you consider the nafloat64 > >> >> option to be still in play as the first thing to be implemented > >> >> (before array.mask). If it is, what kind of thing would persuade > you > >> >> either way? > >> >> > >> > > >> > Mark can speak for himself, but I think things are tending towards > >> > masks. > >> > They have the advantage of one implementation for all data types, > >> > current > >> > and future, and they are more flexible since the masked data can be > >> > actual > >> > valid data that you just choose to ignore for experimental reasons. > >> > > >> > What might be helpful is a routine to import/export R files, but that > >> > shouldn't be to difficult to implement. > >> > > >> > Chuck > >> > > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > [email protected] > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> > > >> > >> Perhaps we should make a wiki page someplace summarizing pros and cons > >> of the various implementation approaches? I worry very seriously about > >> adding API functions relating to masks rather than having special NA > >> values which propagate in algorithms. The question is: will Joe Blow > >> Former R user have to understand what is the mask and how to work with > >> it? If the answer is yes we have a problem. If it can be completely > >> hidden as an implementation detail, that's great. In R NAs are just > >> sort of inherent-- they propagate you deal with them when you have to > >> via na.rm flag in functions or is.na. > >> > > > > Well, I think both of those can be pretty transparent. Could you > illustrate > > some typical R usage, to wit. > > > > 1) setting a value to na > > 2) checking a value for na > > > > Other things are problematic, like checking for integer overflow. For > safety > > that would be desireable, for speed not. I think that is a separate > question > > however. In any case, if we do check such things we should be able to set > > the corresponding mask value in the loop, and I suppose that is the sort > of > > thing you want. > > > > Chuck > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > [email protected] > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > I think anyone making decisions about this needs to have a pretty good > understanding of what R does. So here's some examples but you guys > really need to spend some time with R if you have not already > > arr <- rnorm(20) > arr > [1] 1.341960278 0.757033314 -0.910468762 -0.475811935 -0.007973053 > [6] 1.618201117 -0.965747088 0.386811224 0.229158237 0.987050613 > [11] 1.293453170 -2.432399045 -0.247593481 -0.639769586 -0.464996583 > [16] 0.720181047 0.846607030 0.486173088 -0.911247626 0.370326788 > arr[5:10] = NA > arr > [1] 1.3419603 0.7570333 -0.9104688 -0.4758119 NA NA > [7] NA NA NA NA 1.2934532 -2.4323990 > [13] -0.2475935 -0.6397696 -0.4649966 0.7201810 0.8466070 0.4861731 > [19] -0.9112476 0.3703268 > is.na(arr) > [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE > FALSE > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > mean(arr) > [1] NA > mean(arr, na.rm=T) > [1] -0.01903945 > > arr + rnorm(20) > [1] 2.081580297 0.505050028 -0.696287035 -1.280323279 NA > [6] NA NA NA NA NA > [11] 2.166078369 -1.445271291 0.764894624 0.795890929 0.549621207 > [16] 0.005215596 -0.170001426 0.712335355 -0.919671745 -0.617099818 > > and obviously this is OK too: > > arr <- rep('wes', 10) > arr[5:7] <- NA > is.na(arr) > [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE > > note, NA gets excluded from categorical variables (factors): > as.factor(arr) > [1] wes wes wes wes <NA> <NA> <NA> wes wes wes > Levels: wes > > e.g. groupby with NA: > > > tapply(rnorm(10), arr, mean) > wes > -0.5271853 > I think those are all doable. The main concerns I have at the moment are: 1) Tracking things like integer overflow, yes, no. 2) Memory. I suppose masks could be packed into bits if it came to that. Chuck
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
