Hi andreas, Please give a sample of your data, and how you want it to be after the manipulation. Consider using ?dput
----------------Contact Details:------------------------------------------------------- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Tue, Jan 24, 2012 at 11:54 AM, ak13 <andreas.ka...@gmail.com> wrote: > Hi, > > I am a total newbie to R so I apologize if the answer to my question is too > obvious. I a data set of the following form: > > > > > > Date > V1 > V... > VN > Region > Industry > > > > 22/03/1995 23:01:12 > 1 > 3 > 2 > 15 > A > > > > 21/03/1995 21:01:12 > 3 > 3 > 1 > 9 > C > > > > 1/04/1995 17:01:06 > 3 > 2 > 1 > 3 > B > > > > Now I would like to analyze the data in the data.frame by Region, Industry, > Date (I would like to collapse the whole think to weekly data) and by the > three different answering options {1,2,3} in V1...VN. In stata which I used > before i did this step by step with a loop over all questions (V1...VN): > egen pos_`X'=total(`X'==1), by(industry week_year); egen > pos_`X'=total(`X'==2, by(industry week_year). This step-by-step procedure > works because stata, even if the dates are displayed as weeks, doesn't > aggregate the values immediately. Unfortunately there seems to be no > command > which works exactly in the same manner as by() (from stata) in R. My by now > most successful attempt accomplish the above described task was by using: > > as.data.frame(tapply(euwifo[,1]=1, list(df$date, df$region, df$industry), > mean)) > > (where date is formatted as ISO-weekly %U) > Of course I would have to loop this over all questions (20) and all > answering possibilities (3) but at least it gives me an out put of the > structure: > > > > > > . > industry.region > Industry.region > industry.region > industry.region > > > > 10-1995 > 32 > 45 > 10 > 9 > > > > 15-1995 > 2 > 47 > 5 > 6 > > > > I could live with that because I could recombine the so created different > dataframes thenafter. My problem however is tapply doesn't preserve the > dataframe's format as a time series (xts). This means R aggregates by time > (week) (and industry and region) but the weeks on the x-axis are not in the > right order. I also tried to apply.weekly() but this doesn't seem to do > what > I want to do. > > Could anyone give me a hint how i could to this? Maybe with formatting the > data frame as time series data beforehand with preserving this during that > procedure. And maybe somebody also has an idea how I can maybe avoid all > this looping. > > I would appreciate it very much much if somebody of you could give me a > hint! > > Best regards, > > Andreas > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Splitting-up-large-set-of-survey-data-into-categories-tp4323327p4323327.html > Sent from the R help mailing list archive at Nabble.com. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.