Also, Julia compiles code on the fly, so you should put it in a function, 
then call it once before timing it to let it "warm up". You can also 
profile the code and use ProfileView.jl to view where the bottleneck is. 
There are many tricks to making things faster.

On Saturday, August 22, 2015 at 5:05:21 PM UTC-6, Yichao Yu wrote:
>
> On Sat, Aug 22, 2015 at 6:55 PM, Michael Wang 
> <[email protected] <javascript:>> wrote: 
> > I am new to Julia. I heard that Julia has the performance with Cpp even 
> > though it is a high level language. I tested an example on my machine, 
> > however, the result was that Julia was in the same ballpark with R not 
> with 
> > Cpp. Here is my codes. 
> > R: 
> > ptm <- proc.time() 
> > 
> > DPY <- 252  ## days per year 
> > NWINDOW <- 126  ## can be smaller or larger than 252 
> > 
> > ds <- read.csv("xri.csv")  ## a sample data set 
> > 
> > ## PS: this is much faster than assigning to a data frame in a loop 
> > b.ols <- sd.ols <- rep(NA, nrow(ds)) 
> > 
> > for (i in 1:nrow(ds)) { 
> >     thisday <- ds$day[i] 
> >     if (thisday %% DPY != 0) next  ## calculate only on year end 
> >     if (thisday < DPY) next  ## start only without NA 
> >     thisfm <- ds$fm[i] 
> >     datasubset <- subset( ds, (ds$fm==thisfm) & 
> (ds$day>=(thisday-NWINDOW)) 
> > & (ds$day<=(thisday-1)) ) 
> >     olsreg <- lm(xr ~ xm, data = datasubset) 
> >     b.ols[i] <- coef(olsreg)[2] 
> >     sd.ols[i] <- sqrt(vcov(olsreg)[2, 2]) 
> >     cat(i, " ")  ## ping me to see we are not dead for large data sets 
> > } 
> > 
> > ds$b.ols <- b.ols 
> > ds$sd.ols <- sd.ols 
> > 
> > cat("\nOLS Beta Regressions are Done\n") 
> > 
> > ds$xsect.sd <- ave(ds$b.ols, ds$day, FUN=function(x) sd(x, na.rm=T)) 
> > ds$xsect.mean <- ave(ds$b.ols, ds$day, FUN=function(x) mean(x, na.rm=T)) 
> > 
> > cat("Cross-Sectional OLS Statistics are Done\n") 
> > 
> > ds <- within(ds, { 
> >                  w.ols <- xsect.sd^2/(sd.ols^2+xsect.sd^2) 
> >                  b.vck <- round(w.ols*b.ols + (1-w.ols)*xsect.mean,4) 
> >                  b.ols <- round(b.ols,4) 
> >              }) 
> > 
> > cat("OLS and VCK are Done.  Now Writing Output.\n") 
> > 
> > proc.time() - ptm 
> > 
> > 
> > The running time is around 30 seconds for R. 
> > 
> > Julia: 
> > using DataFrames 
> > using DataFramesMeta 
> > using GLM 
> > 
> > tic() 
> > DPY = 252  ## days per year 
> > NWINDOW = 126  ## can be smaller or larger than 252 
> > 
> > ds = readtable("xri.csv")  ## a sample data set 
> > 
> > # create two empty arrays to store b_ols and sd_ols value 
> > b_ols = DataArray(Float64, size(ds)[1]) 
> > sd_ols = DataArray(Float64, size(ds)[1]) 
> > 
> > for i = 1:size(ds)[1] 
> > thisDay = ds[i, :day] ## Julia DataFrame way of accessing data, in R: 
> > ds$day[i] 
> > if mod(thisDay, DPY) != 0 
> > continue 
> > end 
> > if thisDay < DPY 
> > continue 
> > end 
> > thisFm = ds[i, :fm] 
> > dataSubset = @where(ds, (:fm .== thisFm) & (:day .>= (thisDay - 
> NWINDOW)) & 
> > (:day .<= (thisDay - 1))) 
> > olsReg = fit(LinearModel, xr ~ xm, dataSubset) ## OLS from package GLM 
> > b_ols[i] = coef(olsReg)[2] ## returns the OLS coefficients 
> > sd_ols[i] = stderr(olsReg)[2] ## returns the OLS coefficients' standard 
> > error 
> > print(i, " ") 
> > end 
> > 
> > ds[:b_ols] = b_ols 
> > ds[:sd_ols] = sd_ols 
> > 
> > print("\nOLS Beta Regressions are Done\n") 
> > 
> > ds = join(ds, by(ds, :day) do ds 
> >     DataFrame(xsect_mean = mean(dropna(ds[:b_ols])), xsect_sd = 
> > std(dropna(ds[:b_ols]))) 
> > end, on = [:day], kind = :inner) 
> > ds = sort!(ds) 
> > 
> > print("Cross-Sectional OLS Statistics are Done\n") 
> > 
> > ds[:w_ols] = @with(ds, :xsect_sd.^2 ./ (:sd_ols.^2 + :xsect_sd.^2)) 
> > ds[:b_vck] = @with(ds, round(:w_ols .* :b_ols + (1 - :w_ols) .* 
> :xsect_mean, 
> > 4)) 
> > ds[:b_ols] = @with(ds, round(:b_ols, 4)) 
> > 
> > print("OLS and VCK are Done.  Now Writing Output.\n") 
> > 
> > toc() 
> > 
> > The running time is around 15 seconds for Julia. 
>
> Not really familiar with DataFrame etc, the most obvious thing is 
>
> http://julia.readthedocs.org/en/latest/manual/performance-tips/#avoid-global-variables
>  
> . 
>
> > 
> > I tried C++, too. Having the same output with R and Julia, C++ only used 
> > 0.23 seconds. Can someone tell me why this is happening? 
> > 
> > 
> > 
> > 
> > 
>

Reply via email to