Hi, How about this :
http://stackoverflow.com/questions/7235657/fastest-way-to-replace-nas-in-a-large-data-table Skip to the end marked EDIT and reversing 0 and NA for your case: DT = data.table(a=c(1L,0L,3L),b=c(0L,5:6),c=c(7:8,0L)) a b c [1,] 1 0 7 [2,] 0 5 8 [3,] 3 6 0 for (i in names(DT)[2:3]) DT[get(i)==0L,i:=NA_integer_,with=FALSE] > DT a b c [1,] 1 NA 7 [2,] 0 5 8 [3,] 3 6 NA > If you have 000's of columns, then this sort of thing is just what the new set() is for, to avoid the overhead of repeatedly calling [.data.table, e.g. : for (i in 2:3) set(DT,which(DT[[i]]==0L),i,NA_integer_) Without the 'which' you get an error "i is type 'logical'. Must be integer, or numeric is coerced with warning.". I'll add to that message something like "logical isn't accepted as i for speed since set() is intended for inside loops; checking and coercing logical to integer row positions takes time. Wrap logical i with which() if required". One reason for liking for() loops with data.table is that working on one column at a time makes sense for large tables (less working memory needed). I thought about constructing a list() RHS of := with a vector of column names on the LHS of :=, but that approach doesn't scale as the number of columns grows, due to needed space for the entire RHS. The for() loop above is much better for that reason. [Multiple LHS of := is more for when the RHS is a single value repeated for all the columns, such as 0L or NA, or, NULL to delete multiple columns in one step.] Finally, the .SD approach should work when bug #1732 is fixed (".SD, .N and .BY should be available when by="" and by=NULL"), but still not as efficient as the for loop above. Matthew > Dear all, > > I have a large data.table and I am simply trying to replace all zeros > with NAs in a subset of columns. So I tried first: > >> dt[, lapply(.SD, function(x) x := ifelse(x==0, NA, x)), .SDcols=3:30] > Error in lapply(.SD, function(x) `:=`(x, ifelse(x == 0, NA, x))) : > object '.SD' not found > > Clearly that's not the right approach... > > Then I tried: > >> for (i in names(dt)[3:30]) { > eval(parse(text=paste("dt[`", i, "`==0, `", i, "` := NA]", sep=""))) > } > That worked but is rather ugly. > > would you recommend any better way to avoid the eval(parse()) to > perform such simple tasks? > > Thanks in advance, --Mel. > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
