Two questions: 1) Are there any good R guides/sites with information/techniques for dealing with large datasets in R? (Large being ~2 mil rows and ~200 columns)
2) My specific problem with this dataset. I am essentially trying to convert a date and add it to a data frame. I imagine any 'data manipulation on a column within dataframe into a new column' will present the same issue, be it as.Date or anything else. I have a dataset, size > dim(morbidity) [1] 1775683 264 This was read in from a STATA .dta file. The dates have come in as the number of ms from 1960 so I have the following to convert these to usable dates. as.Date(morbidity$adm_date / (100*10*60*60*24), origin="1960-01-01") when I store this as a vector it is near instant, <5 seconds test <- as.Date(etc) when I place it over itself it takes ~20 minutes morbidity$adm_date <- as.Date(etc) when I place the vector over it (so no computation involved), or place it as a new column it still takes ~20 minutes morbidity$adm_date <- test morbidity$new_col <- test when I tried a cbind to add it that way it took >20 minutes new_morb <- cbind(morbidity,test) Has anyone done something similar or know of a different command that should work faster? I can't get my head around what R is doing, if it can create the vector instantly then the computation is quite simple, I don't understand why then adding it as a column to a dataframe can take that long. R64 bit on mac os x, 2.4 GHz dual core, 8gb ram so more than enough resources. Thanks Matt [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.