I'm using the dplyr package to perform one-row-at-a-time processing of a data frame:
> rnd6 = function() sample(1:300, 6) > frm = data.frame(AA=rnd6(), BB=rnd6(), CC=rnd6()) > frm AA BB CC 1 123 50 45 2 12 30 231 3 127 147 100 4 133 32 129 5 66 235 71 6 38 264 261 The interface is nice and straightforward: > library(dplyr) > dplyr_result = frm %>% rowwise() %>% do(MM=max(as.numeric(.))) I've gotten used to the fact that dplyr_result is not a good old "vanilla" data frame. The as.data.frame() function *seems* to do the trick: > dplyr_result_2 = as.data.frame(dplyr_result) > dplyr_result_2 MM 1 123 2 231 3 147 4 133 5 235 6 264 ... but there's trouble ahead: > mean(dplyr_result_2$MM) [1] NA Warning message: In mean.default(dplyr_result_2$MM) : argument is not numeric or logical: returning NA I need to enlist unlist() to get me to my destination: > mean(unlist(dplyr_result_2$MM)) [1] 188.8333 [NOTE: dplyr's as_data_frame() function does a better job than as.data.frame() of indicating that I was headed for trouble. ] By contrast, the plyr package's adply() function *does* produce a vanilla data frame: > library(plyr) > plyr_result = adply(frm, .margins=1, function(onerowfrm) > max(as.numeric(onerowfrm[1,]))) > mean(plyr_result$V1) [1] 188.8333 Is there a good reason for dplyr to require the extra processing? My (naïve ?) recommendation would be to have as_data_frame() produce a vanilla data frame. Tx, John ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.