Hi Sergey,
I believe the code below should get you close to want you want.
For dates, I usually store them as POSIXct classes in data frames, but
according to Gabor Grothendieck and Thomas Petzoldt's R Help Desk article
http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf, I should probably be
using chron date and times...
Nonetheless, POSIXct casses are what I know so I can show you that to get the
month out of your column (replace 8.29.97 with your variable), you can do
the following:
month = format(strptime(8.29.97,format=%m.%d.%y),format=%m)
Or,
month = as.data.frame(strsplit(8.29.97,\\.))[1,]
In any case, here is a code, in which I follow a series of function
application and definitions (which effectively includes successive
application of split() and lapply().
Best regards,
ST
# define data (I just made this up)
df -
data.frame(month=as.character(rep(1:3,each=30)),fac=factor(rep(1:2,each=15)),
data1=round(runif(90),2),
data2=round(runif(90),2))
# define functions to split the data and another
# to get statistics
doSplits - function(df) {
unlist(lapply(split(df,df$month),function(x)
split(x,x$fac)),recursive=FALSE)
}
getStats - function(x,f) {
return(as.data.frame(lapply(x[unlist(lapply(x,mode))==numeric
unlist(lapply(x,class))!=factor],f)))
}
# create a matrix of data, means, and standard deviations
listMatrix - cbind(Data=doSplits(df),
Means=lapply(doSplits(df),getStats,mean),
SDs=lapply(doSplits(df),getStats,sd))
# function to subtract means and divide by standard deviations
transformData - function(x) {
newdata - x$Data
matchedNames - match(names(x$Means),names(x$Data))
newdata[matchedNames] -
sweep(sweep(data.matrix(x$Data[matchedNames]),2,unlist(x$Means),-),
2,unlist(x$SDs),/)
return(newdata)
}
# apply to data
newDF - lapply(as.data.frame(t(listMatrix)),transformData)
# Defind Fold function
Fold - function(f, x, L) for(e in L) x - f(x, e)
# Apply this to the data
finalData - Fold(rbind,vector(),newDF)
--- Sergey Goriatchev [EMAIL PROTECTED] wrote:
Hi, fellow R users.
I have a question about sapply and split combination.
I have a big dataframe (4 observations, 21 variables). First
variable (factor) is date and it is in format 8.29.97, that is, I
have monthly data. Second variable (also factor) has levels 1 to 6
(fractiles 1 to 5 and missing value with code 6). The other 19
variables are numeric.
For each month I have several hunder observations of 19 numeric and 1
factor.
I am normalizing the numeric variables by dividing val1 by val2, where:
val1: (for each month, for each numeric variable) difference between
mean of ith numeric variable in fractile 1, and mean of ith numeric
variable in fractile 5.
val2: (for each month, for each numeric variable) standard deviation
for ith numeric variable.
Basically, as far as I understand, I need to use split() function several
times.
To calculate val1 I need to use split() twice - first to split by
month and then split by fractile. Is this even possible to do (since
after first application of split() I get a list)??
Is there a smart way to perform this normalization computation?
My knowledge of R is not so advanced, but I need to know an efficient
way to perform calculations of this kind.
Would really appreciate some help from experienced R users!
Regards,
S
--
Laziness is nothing more than the habit of resting before you get tired.
- Jules Renard (writer)
Experience is one thing you can't get for nothing.
- Oscar Wilde (writer)
When you are finished changing, you're finished.
- Benjamin Franklin (Diplomat)
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.