Firstly read the following
thoroughly: http://docs.julialang.org/en/release-0.3/manual/performance-tips/
Secondly I'd try to avoid IO when benchmarking unless IO is a necessary
part of the operation.
My guess is you'll get a big speedup just by putting everything in a
function, but there'll be other optimisations to make as well.
On Sunday, 23 August 2015 00:01:30 UTC+1, Michael Wang wrote:
>
> I am new to Julia. I heard that even Julia is a high level language, but
> it has the speed of C or C++. I have tested an example on my machine. Using
> the same input and having the same output, R uses around 30 seconds, Julia
> uses around 15 seconds, while C++ only uses 0.23 seconds. Why this is
> happening? I have attached my codes and sample dataset.
>
> R codes:
>
> DPY <- 252 ## days per year
> NWINDOW <- 126 ## can be smaller or larger than 252
>
> ds <- read.csv("xri.csv") ## a sample data set
>
> b.ols <- sd.ols <- rep(NA, nrow(ds))
>
> for (i in 1:nrow(ds)) {
> thisday <- ds$day[i]
> if (thisday %% DPY != 0) next ## calculate only on year end
> if (thisday < DPY) next ## start only without NA
> thisfm <- ds$fm[i]
> datasubset <- subset( ds, (ds$fm==thisfm) &
> (ds$day>=(thisday-NWINDOW)) & (ds$day<=(thisday-1)) )
> olsreg <- lm(xr ~ xm, data = datasubset)
> b.ols[i] <- coef(olsreg)[2]
> sd.ols[i] <- sqrt(vcov(olsreg)[2, 2])
> cat(i, " ") ## ping me to see we are not dead for large data sets
> }
>
> ds$b.ols <- b.ols
> ds$sd.ols <- sd.ols
>
> cat("\nOLS Beta Regressions are Done\n")
>
> ds$xsect.sd <- ave(ds$b.ols, ds$day, FUN=function(x) sd(x, na.rm=T))
> ds$xsect.mean <- ave(ds$b.ols, ds$day, FUN=function(x) mean(x, na.rm=T))
>
> cat("Cross-Sectional OLS Statistics are Done\n")
>
> ds <- within(ds, {
> w.ols <- xsect.sd^2/(sd.ols^2+xsect.sd^2)
> b.vck <- round(w.ols*b.ols + (1-w.ols)*xsect.mean,4)
> b.ols <- round(b.ols,4)
> })
>
> cat("OLS and VCK are Done. Now Writing Output.\n")
>
>
>
>
> Julia codes:
> # load in the required package
> using DataFrames
> using DataFramesMeta
> using GLM
>
> tic()
> DPY = 252 ## days per year
> NWINDOW = 126 ## can be smaller or larger than 252
>
> ds = readtable("xri.csv") ## a sample data set
>
> # create two empty arrays to store b_ols and sd_ols value
> b_ols = DataArray(Float64, size(ds)[1])
> sd_ols = DataArray(Float64, size(ds)[1])
>
> for i = 1:size(ds)[1]
> thisDay = ds[i, :day] ## Julia DataFrame way of accessing data, in R:
> ds$day[i]
> if mod(thisDay, DPY) != 0
> continue
> end
> if thisDay < DPY
> continue
> end
> thisFm = ds[i, :fm]
> dataSubset = @where(ds, (:fm .== thisFm) & (:day .>= (thisDay - NWINDOW))
> & (:day .<= (thisDay - 1)))
> ## DataFramesMeta useage. fast subseting a dataframe. the dot operator is
> the same as Matlab representing
> ## element-wise operation
> olsReg = fit(LinearModel, xr ~ xm, dataSubset) ## OLS from package GLM
> b_ols[i] = coef(olsReg)[2] ## returns the OLS coefficients
> sd_ols[i] = stderr(olsReg)[2] ## returns the OLS coefficients' standard
> error
> print(i, " ")
> end
>
> # adding new columns to the ds dataframe
> ds[:b_ols] = b_ols
> ds[:sd_ols] = sd_ols
>
> print("\nOLS Beta Regressions are Done\n")
>
> ds = join(ds, by(ds, :day) do ds
> DataFrame(xsect_mean = mean(dropna(ds[:b_ols])), xsect_sd =
> std(dropna(ds[:b_ols])))
> end, on = [:day], kind = :inner)
> ds = sort!(ds)
>
> print("Cross-Sectional OLS Statistics are Done\n")
>
> # adding new columns and editing columns using DataFrameMeta
> ds[:w_ols] = @with(ds, :xsect_sd.^2 ./ (:sd_ols.^2 + :xsect_sd.^2))
> ds[:b_vck] = @with(ds, round(:w_ols .* :b_ols + (1 - :w_ols) .*
> :xsect_mean, 4))
> ds[:b_ols] = @with(ds, round(:b_ols, 4))
>
> print("OLS and VCK are Done. Now Writing Output.\n")
>
> toc()
>
>
>
>