Two questions:
1) Are there any good R guides/sites with information/techniques for dealing
with large datasets in R? (Large being ~2 mil rows and ~200 columns)

2) My specific problem with this dataset.

I am essentially trying to convert a date and add it to a data frame. I
imagine any 'data manipulation on a column within dataframe into a new
column' will present the same issue, be it as.Date or anything else.

I have a dataset, size

> dim(morbidity)
[1] 1775683     264

This was read in from a STATA .dta file. The dates have come in as the
number of ms from 1960 so I have the following to convert these to usable
dates.

as.Date(morbidity$adm_date / (100*10*60*60*24), origin="1960-01-01")

when I store this as a vector it is near instant, <5 seconds
test <- as.Date(etc)
when I place it over itself it takes ~20 minutes
morbidity$adm_date <- as.Date(etc)
when I place the vector over it (so no computation involved), or place it as
a new column it still takes ~20 minutes
morbidity$adm_date <- test
morbidity$new_col <- test
when I tried a cbind to add it that way it took >20 minutes
new_morb <- cbind(morbidity,test)

Has anyone done something similar or know of a different command that should
work faster? I can't get my head around what R is doing, if it can create
the vector instantly then the computation is quite simple, I don't
understand why then adding it as a column to a dataframe can take that long.

R64 bit on mac os x, 2.4 GHz dual core, 8gb ram so more than enough
resources.

Thanks
Matt

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to