On Sat, Aug 22, 2015 at 6:55 PM, Michael Wang
<[email protected]> wrote:
> I am new to Julia. I heard that Julia has the performance with Cpp even
> though it is a high level language. I tested an example on my machine,
> however, the result was that Julia was in the same ballpark with R not with
> Cpp. Here is my codes.
> R:
> ptm <- proc.time()
>
> DPY <- 252  ## days per year
> NWINDOW <- 126  ## can be smaller or larger than 252
>
> ds <- read.csv("xri.csv")  ## a sample data set
>
> ## PS: this is much faster than assigning to a data frame in a loop
> b.ols <- sd.ols <- rep(NA, nrow(ds))
>
> for (i in 1:nrow(ds)) {
>     thisday <- ds$day[i]
>     if (thisday %% DPY != 0) next  ## calculate only on year end
>     if (thisday < DPY) next  ## start only without NA
>     thisfm <- ds$fm[i]
>     datasubset <- subset( ds, (ds$fm==thisfm) & (ds$day>=(thisday-NWINDOW))
> & (ds$day<=(thisday-1)) )
>     olsreg <- lm(xr ~ xm, data = datasubset)
>     b.ols[i] <- coef(olsreg)[2]
>     sd.ols[i] <- sqrt(vcov(olsreg)[2, 2])
>     cat(i, " ")  ## ping me to see we are not dead for large data sets
> }
>
> ds$b.ols <- b.ols
> ds$sd.ols <- sd.ols
>
> cat("\nOLS Beta Regressions are Done\n")
>
> ds$xsect.sd <- ave(ds$b.ols, ds$day, FUN=function(x) sd(x, na.rm=T))
> ds$xsect.mean <- ave(ds$b.ols, ds$day, FUN=function(x) mean(x, na.rm=T))
>
> cat("Cross-Sectional OLS Statistics are Done\n")
>
> ds <- within(ds, {
>                  w.ols <- xsect.sd^2/(sd.ols^2+xsect.sd^2)
>                  b.vck <- round(w.ols*b.ols + (1-w.ols)*xsect.mean,4)
>                  b.ols <- round(b.ols,4)
>              })
>
> cat("OLS and VCK are Done.  Now Writing Output.\n")
>
> proc.time() - ptm
>
>
> The running time is around 30 seconds for R.
>
> Julia:
> using DataFrames
> using DataFramesMeta
> using GLM
>
> tic()
> DPY = 252  ## days per year
> NWINDOW = 126  ## can be smaller or larger than 252
>
> ds = readtable("xri.csv")  ## a sample data set
>
> # create two empty arrays to store b_ols and sd_ols value
> b_ols = DataArray(Float64, size(ds)[1])
> sd_ols = DataArray(Float64, size(ds)[1])
>
> for i = 1:size(ds)[1]
> thisDay = ds[i, :day] ## Julia DataFrame way of accessing data, in R:
> ds$day[i]
> if mod(thisDay, DPY) != 0
> continue
> end
> if thisDay < DPY
> continue
> end
> thisFm = ds[i, :fm]
> dataSubset = @where(ds, (:fm .== thisFm) & (:day .>= (thisDay - NWINDOW)) &
> (:day .<= (thisDay - 1)))
> olsReg = fit(LinearModel, xr ~ xm, dataSubset) ## OLS from package GLM
> b_ols[i] = coef(olsReg)[2] ## returns the OLS coefficients
> sd_ols[i] = stderr(olsReg)[2] ## returns the OLS coefficients' standard
> error
> print(i, " ")
> end
>
> ds[:b_ols] = b_ols
> ds[:sd_ols] = sd_ols
>
> print("\nOLS Beta Regressions are Done\n")
>
> ds = join(ds, by(ds, :day) do ds
>     DataFrame(xsect_mean = mean(dropna(ds[:b_ols])), xsect_sd =
> std(dropna(ds[:b_ols])))
> end, on = [:day], kind = :inner)
> ds = sort!(ds)
>
> print("Cross-Sectional OLS Statistics are Done\n")
>
> ds[:w_ols] = @with(ds, :xsect_sd.^2 ./ (:sd_ols.^2 + :xsect_sd.^2))
> ds[:b_vck] = @with(ds, round(:w_ols .* :b_ols + (1 - :w_ols) .* :xsect_mean,
> 4))
> ds[:b_ols] = @with(ds, round(:b_ols, 4))
>
> print("OLS and VCK are Done.  Now Writing Output.\n")
>
> toc()
>
> The running time is around 15 seconds for Julia.

Not really familiar with DataFrame etc, the most obvious thing is
http://julia.readthedocs.org/en/latest/manual/performance-tips/#avoid-global-variables
.

>
> I tried C++, too. Having the same output with R and Julia, C++ only used
> 0.23 seconds. Can someone tell me why this is happening?
>
>
>
>
>

Reply via email to