Ugh... I tried to make this faster, but I can't.

It looks like one of the culprits is the "@where" line. On my computer your 
program takes 13s, of which 5s are spent on the "@where" line. As a test, I 
re-wrote the program in plain Julia up to the "@where" line (which I 
replaced by native Julia), and the time dropped from 5s to 0.018s. So we 
have evidence that, in principle, Julia should be able to perform 
reasonably. But unfortunately the GLM module only seems to accept 
DataFrames (it's hard to tell, the documentation is very poor).

All in all, it seems to me that the features provided by DataFrames come at 
a significant speed penalty compared to a simple Julia implementation.

Caveat: I do not use DataFrames.

Cheers,
Daniel.


On Sunday, 23 August 2015 01:01:30 UTC+2, Michael Wang wrote:
>
> I am new to Julia. I heard that even Julia is a high level language, but 
> it has the speed of C or C++. I have tested an example on my machine. Using 
> the same input and having the same output, R uses around 30 seconds, Julia 
> uses around 15 seconds, while C++ only uses 0.23 seconds. Why this is 
> happening? I have attached my codes and sample dataset.
>
> R codes:
>
> DPY <- 252  ## days per year
> NWINDOW <- 126  ## can be smaller or larger than 252
>
> ds <- read.csv("xri.csv")  ## a sample data set
>
> b.ols <- sd.ols <- rep(NA, nrow(ds))
>
> for (i in 1:nrow(ds)) {
>     thisday <- ds$day[i]
>     if (thisday %% DPY != 0) next  ## calculate only on year end
>     if (thisday < DPY) next  ## start only without NA
>     thisfm <- ds$fm[i]
>     datasubset <- subset( ds, (ds$fm==thisfm) & 
> (ds$day>=(thisday-NWINDOW)) & (ds$day<=(thisday-1)) )
>          olsreg <- lm(xr ~ xm, data = datasubset)
>     b.ols[i] <- coef(olsreg)[2]
>     sd.ols[i] <- sqrt(vcov(olsreg)[2, 2])
>     cat(i, " ")  ## ping me to see we are not dead for large data sets
> }
>
> ds$b.ols <- b.ols
> ds$sd.ols <- sd.ols
>
> cat("\nOLS Beta Regressions are Done\n")
>
> ds$xsect.sd <- ave(ds$b.ols, ds$day, FUN=function(x) sd(x, na.rm=T))
> ds$xsect.mean <- ave(ds$b.ols, ds$day, FUN=function(x) mean(x, na.rm=T))
>
> cat("Cross-Sectional OLS Statistics are Done\n")
>
> ds <- within(ds, {
>                  w.ols <- xsect.sd^2/(sd.ols^2+xsect.sd^2)
>                  b.vck <- round(w.ols*b.ols + (1-w.ols)*xsect.mean,4)
>                  b.ols <- round(b.ols,4)
>              })
>
> cat("OLS and VCK are Done.  Now Writing Output.\n")
>
>
>
>
> Julia codes:
> # load in the required package
> using DataFrames
> using DataFramesMeta
> using GLM
>
> tic()
> DPY = 252  ## days per year
> NWINDOW = 126  ## can be smaller or larger than 252
>
> ds = readtable("xri.csv")  ## a sample data set
>
> # create two empty arrays to store b_ols and sd_ols value
> b_ols = DataArray(Float64, size(ds)[1])
> sd_ols = DataArray(Float64, size(ds)[1])
>
> for i = 1:size(ds)[1]
> thisDay = ds[i, :day] ## Julia DataFrame way of accessing data, in R: 
> ds$day[i]
> if mod(thisDay, DPY) != 0
> continue
> end
> if thisDay < DPY
> continue
> end
> thisFm = ds[i, :fm]
> dataSubset = @where(ds, (:fm .== thisFm) & (:day .>= (thisDay - NWINDOW)) 
> & (:day .<= (thisDay - 1)))
> ## DataFramesMeta useage. fast subseting a dataframe. the dot operator is 
> the same as Matlab representing
> ## element-wise operation
> olsReg = fit(LinearModel, xr ~ xm, dataSubset) ## OLS from package GLM
> b_ols[i] = coef(olsReg)[2] ## returns the OLS coefficients
> sd_ols[i] = stderr(olsReg)[2] ## returns the OLS coefficients' standard 
> error
> print(i, " ")
> end
>
> # adding new columns to the ds dataframe
> ds[:b_ols] = b_ols
> ds[:sd_ols] = sd_ols
>
> print("\nOLS Beta Regressions are Done\n")
>
> ds = join(ds, by(ds, :day) do ds
>     DataFrame(xsect_mean = mean(dropna(ds[:b_ols])), xsect_sd = 
> std(dropna(ds[:b_ols])))
> end, on = [:day], kind = :inner)
> ds = sort!(ds)
>
> print("Cross-Sectional OLS Statistics are Done\n")
>
> # adding new columns and editing columns using DataFrameMeta 
> ds[:w_ols] = @with(ds, :xsect_sd.^2 ./ (:sd_ols.^2 + :xsect_sd.^2))
> ds[:b_vck] = @with(ds, round(:w_ols .* :b_ols + (1 - :w_ols) .* 
> :xsect_mean, 4))
> ds[:b_ols] = @with(ds, round(:b_ols, 4))
>
> print("OLS and VCK are Done.  Now Writing Output.\n")
>
> toc()
>
>
>
>

Reply via email to