Re: [R] dplyr: producing a good old data frame

2015-02-25 Thread Hadley Wickham
Hi John,

Just printing the result gives a good indication where the problem lies:

 frm %% rowwise() %% do(MM=max(as.numeric(.)))
Source: local data frame [6 x 1]
Groups: by row

MM
1 dbl[1]
2 dbl[1]
3 dbl[1]
4 dbl[1]
5 dbl[1]
6 dbl[1]

do() is designed to produce scalars (e.g. a linear model), not
vectors, so it doesn't join the results back into a single vector. You
can either fix this yourself with unlist(), or use tidyr::unnest()
which will also handle vectors with length  1.

Hadley

On Mon, Feb 23, 2015 at 2:54 PM, John Posner john.pos...@mjbiostat.com wrote:
 I'm using the dplyr package to perform one-row-at-a-time processing of a data 
 frame:

 rnd6 = function() sample(1:300, 6)
 frm = data.frame(AA=rnd6(), BB=rnd6(), CC=rnd6())

 frm
AA  BB  CC
 1 123  50  45
 2  12  30 231
 3 127 147 100
 4 133  32 129
 5  66 235  71
 6  38 264 261

 The interface is nice and straightforward:

 library(dplyr)
 dplyr_result = frm %% rowwise() %% do(MM=max(as.numeric(.)))

 I've gotten used to the fact that dplyr_result is not a good old vanilla 
 data frame. The as.data.frame() function *seems* to do the trick:

 dplyr_result_2 = as.data.frame(dplyr_result)
 dplyr_result_2
MM
 1 123
 2 231
 3 147
 4 133
 5 235
 6 264

 ... but there's trouble ahead:

 mean(dplyr_result_2$MM)
 [1] NA
 Warning message:
 In mean.default(dplyr_result_2$MM) :
   argument is not numeric or logical: returning NA

 I need to enlist unlist() to get me to my destination:

 mean(unlist(dplyr_result_2$MM))
 [1] 188.8333

 [NOTE: dplyr's as_data_frame() function does a better job than 
 as.data.frame() of indicating that I was headed for trouble. ]

 By contrast, the plyr package's adply() function *does* produce a vanilla 
 data frame:

   library(plyr)
 plyr_result = adply(frm, .margins=1, function(onerowfrm) 
 max(as.numeric(onerowfrm[1,])))
 mean(plyr_result$V1)
 [1] 188.8333

 Is there a good reason for dplyr to require the extra processing? My (naïve 
 ?) recommendation would be to have as_data_frame() produce a vanilla data 
 frame.

 Tx,
 John

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] dplyr: producing a good old data frame

2015-02-23 Thread John Posner
I'm using the dplyr package to perform one-row-at-a-time processing of a data 
frame:

 rnd6 = function() sample(1:300, 6)
 frm = data.frame(AA=rnd6(), BB=rnd6(), CC=rnd6())

 frm
   AA  BB  CC
1 123  50  45
2  12  30 231
3 127 147 100
4 133  32 129
5  66 235  71
6  38 264 261

The interface is nice and straightforward:

 library(dplyr)
 dplyr_result = frm %% rowwise() %% do(MM=max(as.numeric(.)))

I've gotten used to the fact that dplyr_result is not a good old vanilla data 
frame. The as.data.frame() function *seems* to do the trick:

 dplyr_result_2 = as.data.frame(dplyr_result)
 dplyr_result_2
   MM
1 123
2 231
3 147
4 133
5 235
6 264

... but there's trouble ahead:

 mean(dplyr_result_2$MM)
[1] NA
Warning message:
In mean.default(dplyr_result_2$MM) :
  argument is not numeric or logical: returning NA

I need to enlist unlist() to get me to my destination:

 mean(unlist(dplyr_result_2$MM))
[1] 188.8333

[NOTE: dplyr's as_data_frame() function does a better job than as.data.frame() 
of indicating that I was headed for trouble. ]

By contrast, the plyr package's adply() function *does* produce a vanilla data 
frame:

  library(plyr)
 plyr_result = adply(frm, .margins=1, function(onerowfrm) 
 max(as.numeric(onerowfrm[1,])))
 mean(plyr_result$V1)
[1] 188.8333

Is there a good reason for dplyr to require the extra processing? My (naïve ?) 
recommendation would be to have as_data_frame() produce a vanilla data frame.

Tx,
John

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.