Hello there,
Im computing the total value of an order from the price of the order items using a for loop and the ifelse function. I do this on a large dataframe (close to 1m lines). The computation of this function is painfully slow: in 1min only about 90 rows are calculated. The computation time taken for a given number of rows increases with the size of the dataset, see the example with my function below: # small dataset: function performs well exampledata<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] system.time(for (i in 2:length(exampledata[,1])) {exampledata[i,"orderAmount"]<-ifelse(exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"]+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) # large dataset: the very same computational task takes much longer exampledata2<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4,5:2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020)) exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] system.time(for (i in 2:9) {exampledata2[i,"orderAmount"]<-ifelse(exampledata2[i,"orderID"]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"]+exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) Does someone know a way to increase the speed? Thank you very much! Caroline [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.