subject:"Re\: \[R\] Please explain \"do.call\" in this context, or critique to \"stack this list faster\""

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-05 Thread baptiste auguie

Another way that I like is reshape::melt.list() because it keeps track
of the name of the original data.frames,

l = replicate(1e4, data.frame(x=rnorm(100),y=rnorm(100)), simplify=FALSE)
system.time(a - rbind.fill(l))
#   user  system elapsed
# 2.482   0.111   2.597
system.time(b - melt(l,id=1:2))
#   user  system elapsed
#  6.556   0.229   6.801
system.time(c - do.call(rbind, l))
#  user  system elapsed
# 55.020  71.356 129.300

all.equal(a, b[ , -3])
#[1] TRUE

baptiste

On 5 September 2010 04:48, Hadley Wickham had...@rice.edu wrote:
 One common way around this is to pre-allocate memory and then to
 populate the object using a loop, but a somewhat easier solution here
 turns out to be ldply() in the plyr package. The following is the same
 idea as do.call(rbind, l), only faster:

 system.time(u3 - ldply(l, rbind))
   user  system elapsed
   6.07    0.01    6.09

 I think all you want here is rbind.fill:

 system.time(a - rbind.fill(l))
   user  system elapsed
  1.426   0.044   1.471

 system.time(b - do.call(rbind, l))
   user  system elapsed
     98      60     162

 all.equal(a, b)
 [1] TRUE

 This is considerably faster than do.call + rbind because I spend a lot
 of time working out how to do this most efficiently. You can see the
 underlying code at http://github.com/hadley/plyr/blob/master/R/rbind.r
 - it's relatively straightforward except for ensuring the output
 columns are the same type as the input columns.  This is a good
 example where optimised R code is much faster than C code.

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 


Dr. Baptiste Auguié

Departamento de Química Física,
Universidade de Vigo,
Campus Universitario, 36310, Vigo, Spain

tel: +34 9868 18617
http://webs.uvigo.es/coloides

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread Erik Iverson


On 09/04/2010 01:37 PM, Paul Johnson wrote:

I've been doing some consulting with students who seem to come to R
from SAS.  They are usually pre-occupied with do loops and it is tough
to persuade them to trust R lists rather than keeping 100s of named
matrices floating around.

Often it happens that there is a list with lots of matrices or data
frames in it and we need to stack those together.  I thought it
would be a simple thing, but it turns out there are several ways to
get it done, and in this case, the most elegant way using do.call is
not the fastest, but it does appear to be the least prone to
programmer error.

I have been staring at ?do.call for quite a while and I have to admit
that I just need some more explanations in order to interpret it.  I
can't really get why this does work

do.call( rbind, mylist)


do.call is *constructing* a function call from the list of arguments,
my.list.

It is shorthand for

rbind(mylist[[1]], mylist[[2]], mylist[[3]]) assuming mylist has
3 elements.




but it does not work to do

sapply ( mylist, rbind).


That's because sapply is calling rbind once for each item
in mylist, not what you want to do to accomplish your goal.


It might help to use a debugging technique to watch when
rbind gets called, and see how many times it gets called
and with what arguments using those two approaches.




Anyway, here's the self contained working example that compares the
speed of various approaches.  If you send yet more ways to do this, I
will add them on and then post the result to my Working Example
collection.

## stackMerge.R
## Paul Johnsonpauljohn at ku.edu
## 2010-09-02


## rbind is neat,but how to do it to a lot of
## data frames?

## Here is a test case

df1- data.frame(x=rnorm(100),y=rnorm(100))
df2- data.frame(x=rnorm(100),y=rnorm(100))
df3- data.frame(x=rnorm(100),y=rnorm(100))
df4- data.frame(x=rnorm(100),y=rnorm(100))

mylist-  list(df1, df2, df3, df4)

## Usually we have done a stupid
## loop  to get this done

resultDF- mylist[[1]]
for (i in 2:4) resultDF- rbind(resultDF, mylist[[i]])

## My intuition was that this should work:
## lapply( mylist, rbind )
## but no! It just makes a new list

## This obliterates the columns
## unlist( mylist )

## I got this idea from code in the
## complete function in the mice package
## It uses brute force to allocate a big matrix of 0's and
## then it places the individual data frames into that matrix.

m- 4
nr- nrow(df1)
nc- ncol(df1)
dataComplete- as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ]- mylist[[j]]



## I searched a long time for an answer that looked better.
## This website is helpful:
## http://stackoverflow.com/questions/tagged/r
## I started to type in the question and 3 plausible answers
## popped up before I could finish.

## The terse answer is:
shortAnswer- do.call(rbind,mylist)

## That's the right answer, see:

shortAnswer == dataComplete
## But I don't understand why it works.

## More importantly, I don't know if it is fastest, or best.
## It is certainly less error prone than dataComplete

## First, make a bigger test case and use system.time to evaluate

phony- function(i){
   data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000))
}
mylist- lapply(1:1000, phony)


### First, try the terse way
system.time( shortAnswer- do.call(rbind, mylist) )


### Second, try the complete way:
m- 1000
nr- nrow(df1)
nc- ncol(df1)

system.time(
dataComplete- as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
  )

system.time(
for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ]- mylist[[j]]
)


## On my Thinkpad T62 dual core, the shortAnswer approach takes about
## three times as long:


##  system.time( bestAnswer- do.call(rbind,mylist) )
##user  system elapsed
##  14.270   1.170  15.433

##  system.time(
## +dataComplete- as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
## +  )
##user  system elapsed
##   0.000   0.000   0.006

##  system.time(
## + for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ]- mylist[[j]]
## + )
##user  system elapsed
##   4.940   0.050   4.989


## That makes the do.call way look slow, and I said hey,
## our stupid for loop at the beginning may not be so bad.
## Wrong. It is a disaster.  Check this out:


##  resultDF- phony(1)
##  system.time(
## + for (i in 2:1000) resultDF- rbind(resultDF, mylist[[i]])
## +)
##user  system elapsed
## 159.740   4.150 163.996




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread Joshua Wiley

To echo what Erik said, the second argument of do.call(), arg, takes a
list of arguments that it passes to the specified function.  Since
rbind() can bind any number of data frames, each dataframe in mylist
is rbind()ed at once.

These two calls should take about the same time (except for time saved typing):

rbind(mylist[[1]], mylist[[2]], mylist[[3]], mylist[[4]]) # 1
do.call(rbind, mylist) # 2

On my system using:

set.seed(1)
dat - rnorm(10^6)
df1 - data.frame(x=dat, y=dat)
mylist -  list(df1, df1, df1, df1)

They do take about the same time (I started two instances of R and ran
both calls but swithed the order because R has a way of being faster
the second time you do the same thing).

[1] Order: 1, 2
   user  system elapsed
   0.600.140.75
   user  system elapsed
   0.410.140.54
[1] Order: 2, 1
   user  system elapsed
   0.560.210.76
   user  system elapsed
   0.410.140.55

Using the for loop is much slower in your later example because
rbind() is getting called over and over, plus you are incrementally
increasing the size of the object containing your results.

 Often it happens that there is a list with lots of matrices or data
 frames in it and we need to stack those together

For my own curiosity, are you reading in a bunch of separate data
files or are these the results of various operations that you
eventually want to combine?

Cheers,

Josh

On Sat, Sep 4, 2010 at 11:37 AM, Paul Johnson pauljoh...@gmail.com wrote:
 I've been doing some consulting with students who seem to come to R
 from SAS.  They are usually pre-occupied with do loops and it is tough
 to persuade them to trust R lists rather than keeping 100s of named
 matrices floating around.

 Often it happens that there is a list with lots of matrices or data
 frames in it and we need to stack those together.  I thought it
 would be a simple thing, but it turns out there are several ways to
 get it done, and in this case, the most elegant way using do.call is
 not the fastest, but it does appear to be the least prone to
 programmer error.

 I have been staring at ?do.call for quite a while and I have to admit
 that I just need some more explanations in order to interpret it.  I
 can't really get why this does work

 do.call( rbind, mylist)

 but it does not work to do

 sapply ( mylist, rbind).

 Anyway, here's the self contained working example that compares the
 speed of various approaches.  If you send yet more ways to do this, I
 will add them on and then post the result to my Working Example
 collection.

 ## stackMerge.R
 ## Paul Johnson pauljohn at ku.edu
 ## 2010-09-02


 ## rbind is neat,but how to do it to a lot of
 ## data frames?

 ## Here is a test case

 df1 - data.frame(x=rnorm(100),y=rnorm(100))
 df2 - data.frame(x=rnorm(100),y=rnorm(100))
 df3 - data.frame(x=rnorm(100),y=rnorm(100))
 df4 - data.frame(x=rnorm(100),y=rnorm(100))

 mylist -  list(df1, df2, df3, df4)

 ## Usually we have done a stupid
 ## loop  to get this done

 resultDF - mylist[[1]]
 for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]])

 ## My intuition was that this should work:
 ## lapply( mylist, rbind )
 ## but no! It just makes a new list

 ## This obliterates the columns
 ## unlist( mylist )

 ## I got this idea from code in the
 ## complete function in the mice package
 ## It uses brute force to allocate a big matrix of 0's and
 ## then it places the individual data frames into that matrix.

 m - 4
 nr - nrow(df1)
 nc - ncol(df1)
 dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
 for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]



 ## I searched a long time for an answer that looked better.
 ## This website is helpful:
 ## http://stackoverflow.com/questions/tagged/r
 ## I started to type in the question and 3 plausible answers
 ## popped up before I could finish.

 ## The terse answer is:
 shortAnswer - do.call(rbind,mylist)

 ## That's the right answer, see:

 shortAnswer == dataComplete
 ## But I don't understand why it works.

 ## More importantly, I don't know if it is fastest, or best.
 ## It is certainly less error prone than dataComplete

 ## First, make a bigger test case and use system.time to evaluate

 phony - function(i){
  data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000))
 }
 mylist - lapply(1:1000, phony)


 ### First, try the terse way
 system.time( shortAnswer - do.call(rbind, mylist) )


 ### Second, try the complete way:
 m - 1000
 nr - nrow(df1)
 nc - ncol(df1)

 system.time(
   dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
  )

 system.time(
   for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]
 )


 ## On my Thinkpad T62 dual core, the shortAnswer approach takes about
 ## three times as long:


 ##  system.time( bestAnswer - do.call(rbind,mylist) )
 ##    user  system elapsed
 ##  14.270   1.170  15.433

 ##  system.time(
 ## +    dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
 ## +  )

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread David Winsemius


Paul;

There is another group of functions that are similar to do.call in  
their action of serial applications of a function to a list or vector.  
They are somewhat more tolerant in that dyadic operators can be used  
as the function argument, whereas do.call is really just expanding the  
second argument The one that is _most_ similar is Reduce()


?Reduce

A somewhat smaller example than ours...
 df1- data.frame(x=rnorm(5),y=rnorm(5))
 df2- data.frame(x=rnorm(5),y=rnorm(5))
 df3- data.frame(x=rnorm(5),y=rnorm(5))
 df4- data.frame(x=rnorm(5),y=rnorm(5))

 mylist-  list(df1, df2, df3, df4)
 Reduce(rbind, mylist)
 x   y
1  -0.40175483 -0.96187409
2   0.76629538  0.92201312
3   2.44535842  0.90634825
4   0.57784258 -2.12756145
5  -1.62083235 -0.96310011
6   0.02625574  1.17684408
7   1.52412427 -0.26432372
snipped remaining rows

 do.call(+, list(1:3))
[1] 1 2 3
 do.call(+, list(a=1:3, b=3:5))
[1] 4 6 8
 do.call(+, list(a=1:3, b=3:5, cc=7:9))
Error in `+`(a = 1:3, b = 3:5, cc = 7:9) :
  operator needs one or two arguments
 Reduce(+, list(a=1:3, b=3:5, cc=7:9))
[1] 11 14 17


Reduce has the capability of accumulate-ing its intermediate results:

 Reduce(+, 1:10)
[1] 55
 Reduce(+, 1:10, accumulate=TRUE)
 [1]  1  3  6 10 15 21 28 36 45 55



On Sep 4, 2010, at 4:41 PM, Joshua Wiley wrote:


To echo what Erik said, the second argument of do.call(), arg, takes a
list of arguments that it passes to the specified function.  Since
rbind() can bind any number of data frames, each dataframe in mylist
is rbind()ed at once.

These two calls should take about the same time (except for time  
saved typing):


rbind(mylist[[1]], mylist[[2]], mylist[[3]], mylist[[4]]) # 1
do.call(rbind, mylist) # 2

On my system using:

set.seed(1)
dat - rnorm(10^6)
df1 - data.frame(x=dat, y=dat)
mylist -  list(df1, df1, df1, df1)

They do take about the same time (I started two instances of R and ran
both calls but swithed the order because R has a way of being faster
the second time you do the same thing).

[1] Order: 1, 2
  user  system elapsed
  0.600.140.75
  user  system elapsed
  0.410.140.54
[1] Order: 2, 1
  user  system elapsed
  0.560.210.76
  user  system elapsed
  0.410.140.55

Using the for loop is much slower in your later example because
rbind() is getting called over and over, plus you are incrementally
increasing the size of the object containing your results.


Often it happens that there is a list with lots of matrices or data
frames in it and we need to stack those together


For my own curiosity, are you reading in a bunch of separate data
files or are these the results of various operations that you
eventually want to combine?

Cheers,

Josh

On Sat, Sep 4, 2010 at 11:37 AM, Paul Johnson pauljoh...@gmail.com  
wrote:

I've been doing some consulting with students who seem to come to R
from SAS.  They are usually pre-occupied with do loops and it is  
tough

to persuade them to trust R lists rather than keeping 100s of named
matrices floating around.

Often it happens that there is a list with lots of matrices or data
frames in it and we need to stack those together.  I thought it
would be a simple thing, but it turns out there are several ways to
get it done, and in this case, the most elegant way using do.call  
is

not the fastest, but it does appear to be the least prone to
programmer error.

I have been staring at ?do.call for quite a while and I have to admit
that I just need some more explanations in order to interpret it.  I
can't really get why this does work

do.call( rbind, mylist)

but it does not work to do

sapply ( mylist, rbind).

Anyway, here's the self contained working example that compares the
speed of various approaches.  If you send yet more ways to do this, I
will add them on and then post the result to my Working Example
collection.

## stackMerge.R
## Paul Johnson pauljohn at ku.edu
## 2010-09-02


## rbind is neat,but how to do it to a lot of
## data frames?

## Here is a test case

df1 - data.frame(x=rnorm(100),y=rnorm(100))
df2 - data.frame(x=rnorm(100),y=rnorm(100))
df3 - data.frame(x=rnorm(100),y=rnorm(100))
df4 - data.frame(x=rnorm(100),y=rnorm(100))

mylist -  list(df1, df2, df3, df4)

## Usually we have done a stupid
## loop  to get this done

resultDF - mylist[[1]]
for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]])

## My intuition was that this should work:
## lapply( mylist, rbind )
## but no! It just makes a new list

## This obliterates the columns
## unlist( mylist )

## I got this idea from code in the
## complete function in the mice package
## It uses brute force to allocate a big matrix of 0's and
## then it places the individual data frames into that matrix.

m - 4
nr - nrow(df1)
nc - ncol(df1)
dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] -  
mylist[[j]]




## I searched a long time for an answer that looked better.
## This website is

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread Gabor Grothendieck

On Sat, Sep 4, 2010 at 2:37 PM, Paul Johnson pauljoh...@gmail.com wrote:
 I've been doing some consulting with students who seem to come to R
 from SAS.  They are usually pre-occupied with do loops and it is tough
 to persuade them to trust R lists rather than keeping 100s of named
 matrices floating around.

 Often it happens that there is a list with lots of matrices or data
 frames in it and we need to stack those together.  I thought it

This has nothing specifically to do with do.call but note that
R is faster at handling matrices than data frames.  Below
we see that rbind-ing 4 data frames takes over 100 times as
long as rbind-ing matrices with the same data:

 mylist -  list(iris[-5], iris[-5], iris[-5], iris[-5])
 L - lapply(mylist, as.matrix)

 library(rbenchmark)
 benchmark(
+ df = do.call(rbind, mylist),
+ mat = do.call(rbind, L),
+ order = relative, replications = 250
+ )
  test replications elapsed relative user.self sys.self user.child sys.child
2  mat  2500.011  0.02 0.00 NANA
1   df  2501.06  106  1.03 0.01 NANA

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread Dennis Murphy

Hi:

Here's my test:

l - vector('list', 1000)
for(i in seq_along(l)) l[[i]] - data.frame(x=rnorm(100),y=rnorm(100))
system.time(u1 - do.call(rbind, l))
   user  system elapsed
   0.490.060.60
resultDF - data.frame()
system.time(for (i in 1:1000) resultDF - rbind(resultDF, l[[i]]))
   user  system elapsed
  10.340.06   10.53
identical(u1, resultDF)
[1] TRUE

The problem with the second approach, which is really kind of an FAQ
by now, is that repeated application of rbind as a standalone function
results in 'Spaceballs: the search for more memory!' The base
object gets bigger as the iterations proceed, something new is being
added, so more memory is needed to hold both the old and new objects.
This is an inefficient time killer because as the loop proceeds,
increasingly
more time is invested in finding new memory.

Interestingly, this doesn't scale linearly: if we make a list of 1 100 x
2
data frames, I get the following:

 l - vector('list', 1)
 for(i in seq_along(l)) l[[i]] - data.frame(x=rnorm(100),y=rnorm(100))
 system.time(u1 - do.call(rbind, l))
   user  system elapsed
  55.56   30.62   88.11
 dim(u1)
[1] 100   2
 str(u1)
'data.frame':   100 obs. of  2 variables:
 $ x: num  -0.9516 -0.6948 0.0523 2.5798 -0.0862 ...
 $ y: num  1.466 0.165 1.375 0.571 -1.099 ...
 rm(u1)
 rm(resultDF)
 resultDF - data.frame()
# go take a shower and come back
 system.time(for (i in 1:10) resultDF - rbind(resultDF, l[[i]]))
   user  system elapsed
 977.33 121.41 1130.26
 dim(resultDF)
[1] 100   2

This time, neither do.call nor iterative rbind did very well.

One common way around this is to pre-allocate memory and then to
populate the object using a loop, but a somewhat easier solution here
turns out to be ldply() in the plyr package. The following is the same
idea as do.call(rbind, l), only faster:

 system.time(u3 - ldply(l, rbind))
   user  system elapsed
   6.070.016.09
 dim(u3)
[1] 100   2
 str(u3)
'data.frame':   100 obs. of  2 variables:
 $ x: num  -0.9516 -0.6948 0.0523 2.5798 -0.0862 ...
 $ y: num  1.466 0.165 1.375 0.571 -1.099 ...

HTH,
Dennis

On Sat, Sep 4, 2010 at 11:37 AM, Paul Johnson pauljoh...@gmail.com wrote:

 I've been doing some consulting with students who seem to come to R
 from SAS.  They are usually pre-occupied with do loops and it is tough
 to persuade them to trust R lists rather than keeping 100s of named
 matrices floating around.

 Often it happens that there is a list with lots of matrices or data
 frames in it and we need to stack those together.  I thought it
 would be a simple thing, but it turns out there are several ways to
 get it done, and in this case, the most elegant way using do.call is
 not the fastest, but it does appear to be the least prone to
 programmer error.

 I have been staring at ?do.call for quite a while and I have to admit
 that I just need some more explanations in order to interpret it.  I
 can't really get why this does work

 do.call( rbind, mylist)

 but it does not work to do

 sapply ( mylist, rbind).

 Anyway, here's the self contained working example that compares the
 speed of various approaches.  If you send yet more ways to do this, I
 will add them on and then post the result to my Working Example
 collection.

 ## stackMerge.R
 ## Paul Johnson pauljohn at ku.edu
 ## 2010-09-02


 ## rbind is neat,but how to do it to a lot of
 ## data frames?

 ## Here is a test case

 df1 - data.frame(x=rnorm(100),y=rnorm(100))
 df2 - data.frame(x=rnorm(100),y=rnorm(100))
 df3 - data.frame(x=rnorm(100),y=rnorm(100))
 df4 - data.frame(x=rnorm(100),y=rnorm(100))

 mylist -  list(df1, df2, df3, df4)

 ## Usually we have done a stupid
 ## loop  to get this done

 resultDF - mylist[[1]]
 for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]])

 ## My intuition was that this should work:
 ## lapply( mylist, rbind )
 ## but no! It just makes a new list

 ## This obliterates the columns
 ## unlist( mylist )

 ## I got this idea from code in the
 ## complete function in the mice package
 ## It uses brute force to allocate a big matrix of 0's and
 ## then it places the individual data frames into that matrix.

 m - 4
 nr - nrow(df1)
 nc - ncol(df1)
 dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
 for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]



 ## I searched a long time for an answer that looked better.
 ## This website is helpful:
 ## http://stackoverflow.com/questions/tagged/r
 ## I started to type in the question and 3 plausible answers
 ## popped up before I could finish.

 ## The terse answer is:
 shortAnswer - do.call(rbind,mylist)

 ## That's the right answer, see:

 shortAnswer == dataComplete
 ## But I don't understand why it works.

 ## More importantly, I don't know if it is fastest, or best.
 ## It is certainly less error prone than dataComplete

 ## First, make a bigger test case and use system.time to evaluate

 phony - function(i){

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread Hadley Wickham

 One common way around this is to pre-allocate memory and then to
 populate the object using a loop, but a somewhat easier solution here
 turns out to be ldply() in the plyr package. The following is the same
 idea as do.call(rbind, l), only faster:

 system.time(u3 - ldply(l, rbind))
   user  system elapsed
   6.07    0.01    6.09

I think all you want here is rbind.fill:

 system.time(a - rbind.fill(l))
   user  system elapsed
  1.426   0.044   1.471

 system.time(b - do.call(rbind, l))
   user  system elapsed
 98  60 162

 all.equal(a, b)
[1] TRUE

This is considerably faster than do.call + rbind because I spend a lot
of time working out how to do this most efficiently. You can see the
underlying code at http://github.com/hadley/plyr/blob/master/R/rbind.r
- it's relatively straightforward except for ensuring the output
columns are the same type as the input columns.  This is a good
example where optimised R code is much faster than C code.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please explain do.call in this context, or critique to stack this list faster

Re: [R] Please explain do.call in this context, or critique to stack this list faster

Re: [R] Please explain do.call in this context, or critique to stack this list faster

Re: [R] Please explain do.call in this context, or critique to stack this list faster

Re: [R] Please explain do.call in this context, or critique to stack this list faster

Re: [R] Please explain do.call in this context, or critique to stack this list faster

Re: [R] Please explain do.call in this context, or critique to stack this list faster

7 matches

Site Navigation

Mail list logo

Footer information