[julia-users] Re: Speed Comparison "Julia vs R vs C++" a question.

Sean Marshallsay Sun, 23 Aug 2015 07:09:12 -0700

Firstly read the following 
thoroughly: http://docs.julialang.org/en/release-0.3/manual/performance-tips/


Secondly I'd try to avoid IO when benchmarking unless IO is a necessary 
part of the operation.

My guess is you'll get a big speedup just by putting everything in a 
function, but there'll be other optimisations to make as well.

On Sunday, 23 August 2015 00:01:30 UTC+1, Michael Wang wrote:
>
> I am new to Julia. I heard that even Julia is a high level language, but 
> it has the speed of C or C++. I have tested an example on my machine. Using 
> the same input and having the same output, R uses around 30 seconds, Julia 
> uses around 15 seconds, while C++ only uses 0.23 seconds. Why this is 
> happening? I have attached my codes and sample dataset.
>
> R codes:
>
> DPY <- 252  ## days per year
> NWINDOW <- 126  ## can be smaller or larger than 252
>
> ds <- read.csv("xri.csv")  ## a sample data set
>
> b.ols <- sd.ols <- rep(NA, nrow(ds))
>
> for (i in 1:nrow(ds)) {
>     thisday <- ds$day[i]
>     if (thisday %% DPY != 0) next  ## calculate only on year end
>     if (thisday < DPY) next  ## start only without NA
>     thisfm <- ds$fm[i]
>     datasubset <- subset( ds, (ds$fm==thisfm) & 
> (ds$day>=(thisday-NWINDOW)) & (ds$day<=(thisday-1)) )
>          olsreg <- lm(xr ~ xm, data = datasubset)
>     b.ols[i] <- coef(olsreg)[2]
>     sd.ols[i] <- sqrt(vcov(olsreg)[2, 2])
>     cat(i, " ")  ## ping me to see we are not dead for large data sets
> }
>
> ds$b.ols <- b.ols
> ds$sd.ols <- sd.ols
>
> cat("\nOLS Beta Regressions are Done\n")
>
> ds$xsect.sd <- ave(ds$b.ols, ds$day, FUN=function(x) sd(x, na.rm=T))
> ds$xsect.mean <- ave(ds$b.ols, ds$day, FUN=function(x) mean(x, na.rm=T))
>
> cat("Cross-Sectional OLS Statistics are Done\n")
>
> ds <- within(ds, {
>                  w.ols <- xsect.sd^2/(sd.ols^2+xsect.sd^2)
>                  b.vck <- round(w.ols*b.ols + (1-w.ols)*xsect.mean,4)
>                  b.ols <- round(b.ols,4)
>              })
>
> cat("OLS and VCK are Done.  Now Writing Output.\n")
>
>
>
>
> Julia codes:
> # load in the required package
> using DataFrames
> using DataFramesMeta
> using GLM
>
> tic()
> DPY = 252  ## days per year
> NWINDOW = 126  ## can be smaller or larger than 252
>
> ds = readtable("xri.csv")  ## a sample data set
>
> # create two empty arrays to store b_ols and sd_ols value
> b_ols = DataArray(Float64, size(ds)[1])
> sd_ols = DataArray(Float64, size(ds)[1])
>
> for i = 1:size(ds)[1]
> thisDay = ds[i, :day] ## Julia DataFrame way of accessing data, in R: 
> ds$day[i]
> if mod(thisDay, DPY) != 0
> continue
> end
> if thisDay < DPY
> continue
> end
> thisFm = ds[i, :fm]
> dataSubset = @where(ds, (:fm .== thisFm) & (:day .>= (thisDay - NWINDOW)) 
> & (:day .<= (thisDay - 1)))
> ## DataFramesMeta useage. fast subseting a dataframe. the dot operator is 
> the same as Matlab representing
> ## element-wise operation
> olsReg = fit(LinearModel, xr ~ xm, dataSubset) ## OLS from package GLM
> b_ols[i] = coef(olsReg)[2] ## returns the OLS coefficients
> sd_ols[i] = stderr(olsReg)[2] ## returns the OLS coefficients' standard 
> error
> print(i, " ")
> end
>
> # adding new columns to the ds dataframe
> ds[:b_ols] = b_ols
> ds[:sd_ols] = sd_ols
>
> print("\nOLS Beta Regressions are Done\n")
>
> ds = join(ds, by(ds, :day) do ds
>     DataFrame(xsect_mean = mean(dropna(ds[:b_ols])), xsect_sd = 
> std(dropna(ds[:b_ols])))
> end, on = [:day], kind = :inner)
> ds = sort!(ds)
>
> print("Cross-Sectional OLS Statistics are Done\n")
>
> # adding new columns and editing columns using DataFrameMeta 
> ds[:w_ols] = @with(ds, :xsect_sd.^2 ./ (:sd_ols.^2 + :xsect_sd.^2))
> ds[:b_vck] = @with(ds, round(:w_ols .* :b_ols + (1 - :w_ols) .* 
> :xsect_mean, 4))
> ds[:b_ols] = @with(ds, round(:b_ols, 4))
>
> print("OLS and VCK are Done.  Now Writing Output.\n")
>
> toc()
>
>
>
>

[julia-users] Re: Speed Comparison "Julia vs R vs C++" a question.

Reply via email to