On Sat, Aug 22, 2015 at 6:55 PM, Michael Wang <[email protected]> wrote: > I am new to Julia. I heard that Julia has the performance with Cpp even > though it is a high level language. I tested an example on my machine, > however, the result was that Julia was in the same ballpark with R not with > Cpp. Here is my codes. > R: > ptm <- proc.time() > > DPY <- 252 ## days per year > NWINDOW <- 126 ## can be smaller or larger than 252 > > ds <- read.csv("xri.csv") ## a sample data set > > ## PS: this is much faster than assigning to a data frame in a loop > b.ols <- sd.ols <- rep(NA, nrow(ds)) > > for (i in 1:nrow(ds)) { > thisday <- ds$day[i] > if (thisday %% DPY != 0) next ## calculate only on year end > if (thisday < DPY) next ## start only without NA > thisfm <- ds$fm[i] > datasubset <- subset( ds, (ds$fm==thisfm) & (ds$day>=(thisday-NWINDOW)) > & (ds$day<=(thisday-1)) ) > olsreg <- lm(xr ~ xm, data = datasubset) > b.ols[i] <- coef(olsreg)[2] > sd.ols[i] <- sqrt(vcov(olsreg)[2, 2]) > cat(i, " ") ## ping me to see we are not dead for large data sets > } > > ds$b.ols <- b.ols > ds$sd.ols <- sd.ols > > cat("\nOLS Beta Regressions are Done\n") > > ds$xsect.sd <- ave(ds$b.ols, ds$day, FUN=function(x) sd(x, na.rm=T)) > ds$xsect.mean <- ave(ds$b.ols, ds$day, FUN=function(x) mean(x, na.rm=T)) > > cat("Cross-Sectional OLS Statistics are Done\n") > > ds <- within(ds, { > w.ols <- xsect.sd^2/(sd.ols^2+xsect.sd^2) > b.vck <- round(w.ols*b.ols + (1-w.ols)*xsect.mean,4) > b.ols <- round(b.ols,4) > }) > > cat("OLS and VCK are Done. Now Writing Output.\n") > > proc.time() - ptm > > > The running time is around 30 seconds for R. > > Julia: > using DataFrames > using DataFramesMeta > using GLM > > tic() > DPY = 252 ## days per year > NWINDOW = 126 ## can be smaller or larger than 252 > > ds = readtable("xri.csv") ## a sample data set > > # create two empty arrays to store b_ols and sd_ols value > b_ols = DataArray(Float64, size(ds)[1]) > sd_ols = DataArray(Float64, size(ds)[1]) > > for i = 1:size(ds)[1] > thisDay = ds[i, :day] ## Julia DataFrame way of accessing data, in R: > ds$day[i] > if mod(thisDay, DPY) != 0 > continue > end > if thisDay < DPY > continue > end > thisFm = ds[i, :fm] > dataSubset = @where(ds, (:fm .== thisFm) & (:day .>= (thisDay - NWINDOW)) & > (:day .<= (thisDay - 1))) > olsReg = fit(LinearModel, xr ~ xm, dataSubset) ## OLS from package GLM > b_ols[i] = coef(olsReg)[2] ## returns the OLS coefficients > sd_ols[i] = stderr(olsReg)[2] ## returns the OLS coefficients' standard > error > print(i, " ") > end > > ds[:b_ols] = b_ols > ds[:sd_ols] = sd_ols > > print("\nOLS Beta Regressions are Done\n") > > ds = join(ds, by(ds, :day) do ds > DataFrame(xsect_mean = mean(dropna(ds[:b_ols])), xsect_sd = > std(dropna(ds[:b_ols]))) > end, on = [:day], kind = :inner) > ds = sort!(ds) > > print("Cross-Sectional OLS Statistics are Done\n") > > ds[:w_ols] = @with(ds, :xsect_sd.^2 ./ (:sd_ols.^2 + :xsect_sd.^2)) > ds[:b_vck] = @with(ds, round(:w_ols .* :b_ols + (1 - :w_ols) .* :xsect_mean, > 4)) > ds[:b_ols] = @with(ds, round(:b_ols, 4)) > > print("OLS and VCK are Done. Now Writing Output.\n") > > toc() > > The running time is around 15 seconds for Julia.
Not really familiar with DataFrame etc, the most obvious thing is http://julia.readthedocs.org/en/latest/manual/performance-tips/#avoid-global-variables . > > I tried C++, too. Having the same output with R and Julia, C++ only used > 0.23 seconds. Can someone tell me why this is happening? > > > > >
