It is of course possible to write slow julia code. Most of the time in your code is spent in the @where macro in DataFramesMeta so you should look there to see if things can be improved.
On Sunday, August 23, 2015 at 7:30:59 AM UTC+2, Cedric St-Jean wrote: > > Also, Julia compiles code on the fly, so you should put it in a function, > then call it once before timing it to let it "warm up". You can also > profile the code and use ProfileView.jl to view where the bottleneck is. > There are many tricks to making things faster. > > On Saturday, August 22, 2015 at 5:05:21 PM UTC-6, Yichao Yu wrote: >> >> On Sat, Aug 22, 2015 at 6:55 PM, Michael Wang >> <[email protected]> wrote: >> > I am new to Julia. I heard that Julia has the performance with Cpp even >> > though it is a high level language. I tested an example on my machine, >> > however, the result was that Julia was in the same ballpark with R not >> with >> > Cpp. Here is my codes. >> > R: >> > ptm <- proc.time() >> > >> > DPY <- 252 ## days per year >> > NWINDOW <- 126 ## can be smaller or larger than 252 >> > >> > ds <- read.csv("xri.csv") ## a sample data set >> > >> > ## PS: this is much faster than assigning to a data frame in a loop >> > b.ols <- sd.ols <- rep(NA, nrow(ds)) >> > >> > for (i in 1:nrow(ds)) { >> > thisday <- ds$day[i] >> > if (thisday %% DPY != 0) next ## calculate only on year end >> > if (thisday < DPY) next ## start only without NA >> > thisfm <- ds$fm[i] >> > datasubset <- subset( ds, (ds$fm==thisfm) & >> (ds$day>=(thisday-NWINDOW)) >> > & (ds$day<=(thisday-1)) ) >> > olsreg <- lm(xr ~ xm, data = datasubset) >> > b.ols[i] <- coef(olsreg)[2] >> > sd.ols[i] <- sqrt(vcov(olsreg)[2, 2]) >> > cat(i, " ") ## ping me to see we are not dead for large data sets >> > } >> > >> > ds$b.ols <- b.ols >> > ds$sd.ols <- sd.ols >> > >> > cat("\nOLS Beta Regressions are Done\n") >> > >> > ds$xsect.sd <- ave(ds$b.ols, ds$day, FUN=function(x) sd(x, na.rm=T)) >> > ds$xsect.mean <- ave(ds$b.ols, ds$day, FUN=function(x) mean(x, >> na.rm=T)) >> > >> > cat("Cross-Sectional OLS Statistics are Done\n") >> > >> > ds <- within(ds, { >> > w.ols <- xsect.sd^2/(sd.ols^2+xsect.sd^2) >> > b.vck <- round(w.ols*b.ols + (1-w.ols)*xsect.mean,4) >> > b.ols <- round(b.ols,4) >> > }) >> > >> > cat("OLS and VCK are Done. Now Writing Output.\n") >> > >> > proc.time() - ptm >> > >> > >> > The running time is around 30 seconds for R. >> > >> > Julia: >> > using DataFrames >> > using DataFramesMeta >> > using GLM >> > >> > tic() >> > DPY = 252 ## days per year >> > NWINDOW = 126 ## can be smaller or larger than 252 >> > >> > ds = readtable("xri.csv") ## a sample data set >> > >> > # create two empty arrays to store b_ols and sd_ols value >> > b_ols = DataArray(Float64, size(ds)[1]) >> > sd_ols = DataArray(Float64, size(ds)[1]) >> > >> > for i = 1:size(ds)[1] >> > thisDay = ds[i, :day] ## Julia DataFrame way of accessing data, in R: >> > ds$day[i] >> > if mod(thisDay, DPY) != 0 >> > continue >> > end >> > if thisDay < DPY >> > continue >> > end >> > thisFm = ds[i, :fm] >> > dataSubset = @where(ds, (:fm .== thisFm) & (:day .>= (thisDay - >> NWINDOW)) & >> > (:day .<= (thisDay - 1))) >> > olsReg = fit(LinearModel, xr ~ xm, dataSubset) ## OLS from package GLM >> > b_ols[i] = coef(olsReg)[2] ## returns the OLS coefficients >> > sd_ols[i] = stderr(olsReg)[2] ## returns the OLS coefficients' standard >> > error >> > print(i, " ") >> > end >> > >> > ds[:b_ols] = b_ols >> > ds[:sd_ols] = sd_ols >> > >> > print("\nOLS Beta Regressions are Done\n") >> > >> > ds = join(ds, by(ds, :day) do ds >> > DataFrame(xsect_mean = mean(dropna(ds[:b_ols])), xsect_sd = >> > std(dropna(ds[:b_ols]))) >> > end, on = [:day], kind = :inner) >> > ds = sort!(ds) >> > >> > print("Cross-Sectional OLS Statistics are Done\n") >> > >> > ds[:w_ols] = @with(ds, :xsect_sd.^2 ./ (:sd_ols.^2 + :xsect_sd.^2)) >> > ds[:b_vck] = @with(ds, round(:w_ols .* :b_ols + (1 - :w_ols) .* >> :xsect_mean, >> > 4)) >> > ds[:b_ols] = @with(ds, round(:b_ols, 4)) >> > >> > print("OLS and VCK are Done. Now Writing Output.\n") >> > >> > toc() >> > >> > The running time is around 15 seconds for Julia. >> >> Not really familiar with DataFrame etc, the most obvious thing is >> >> http://julia.readthedocs.org/en/latest/manual/performance-tips/#avoid-global-variables >> >> . >> >> > >> > I tried C++, too. Having the same output with R and Julia, C++ only >> used >> > 0.23 seconds. Can someone tell me why this is happening? >> > >> > >> > >> > >> > >> >
