It is of course possible to write slow julia code. Most of the time in your 
code is spent in the @where macro in DataFramesMeta so you should look 
there to see if things can be improved.

On Sunday, August 23, 2015 at 7:30:59 AM UTC+2, Cedric St-Jean wrote:
>
> Also, Julia compiles code on the fly, so you should put it in a function, 
> then call it once before timing it to let it "warm up". You can also 
> profile the code and use ProfileView.jl to view where the bottleneck is. 
> There are many tricks to making things faster.
>
> On Saturday, August 22, 2015 at 5:05:21 PM UTC-6, Yichao Yu wrote:
>>
>> On Sat, Aug 22, 2015 at 6:55 PM, Michael Wang 
>> <[email protected]> wrote: 
>> > I am new to Julia. I heard that Julia has the performance with Cpp even 
>> > though it is a high level language. I tested an example on my machine, 
>> > however, the result was that Julia was in the same ballpark with R not 
>> with 
>> > Cpp. Here is my codes. 
>> > R: 
>> > ptm <- proc.time() 
>> > 
>> > DPY <- 252  ## days per year 
>> > NWINDOW <- 126  ## can be smaller or larger than 252 
>> > 
>> > ds <- read.csv("xri.csv")  ## a sample data set 
>> > 
>> > ## PS: this is much faster than assigning to a data frame in a loop 
>> > b.ols <- sd.ols <- rep(NA, nrow(ds)) 
>> > 
>> > for (i in 1:nrow(ds)) { 
>> >     thisday <- ds$day[i] 
>> >     if (thisday %% DPY != 0) next  ## calculate only on year end 
>> >     if (thisday < DPY) next  ## start only without NA 
>> >     thisfm <- ds$fm[i] 
>> >     datasubset <- subset( ds, (ds$fm==thisfm) & 
>> (ds$day>=(thisday-NWINDOW)) 
>> > & (ds$day<=(thisday-1)) ) 
>> >     olsreg <- lm(xr ~ xm, data = datasubset) 
>> >     b.ols[i] <- coef(olsreg)[2] 
>> >     sd.ols[i] <- sqrt(vcov(olsreg)[2, 2]) 
>> >     cat(i, " ")  ## ping me to see we are not dead for large data sets 
>> > } 
>> > 
>> > ds$b.ols <- b.ols 
>> > ds$sd.ols <- sd.ols 
>> > 
>> > cat("\nOLS Beta Regressions are Done\n") 
>> > 
>> > ds$xsect.sd <- ave(ds$b.ols, ds$day, FUN=function(x) sd(x, na.rm=T)) 
>> > ds$xsect.mean <- ave(ds$b.ols, ds$day, FUN=function(x) mean(x, 
>> na.rm=T)) 
>> > 
>> > cat("Cross-Sectional OLS Statistics are Done\n") 
>> > 
>> > ds <- within(ds, { 
>> >                  w.ols <- xsect.sd^2/(sd.ols^2+xsect.sd^2) 
>> >                  b.vck <- round(w.ols*b.ols + (1-w.ols)*xsect.mean,4) 
>> >                  b.ols <- round(b.ols,4) 
>> >              }) 
>> > 
>> > cat("OLS and VCK are Done.  Now Writing Output.\n") 
>> > 
>> > proc.time() - ptm 
>> > 
>> > 
>> > The running time is around 30 seconds for R. 
>> > 
>> > Julia: 
>> > using DataFrames 
>> > using DataFramesMeta 
>> > using GLM 
>> > 
>> > tic() 
>> > DPY = 252  ## days per year 
>> > NWINDOW = 126  ## can be smaller or larger than 252 
>> > 
>> > ds = readtable("xri.csv")  ## a sample data set 
>> > 
>> > # create two empty arrays to store b_ols and sd_ols value 
>> > b_ols = DataArray(Float64, size(ds)[1]) 
>> > sd_ols = DataArray(Float64, size(ds)[1]) 
>> > 
>> > for i = 1:size(ds)[1] 
>> > thisDay = ds[i, :day] ## Julia DataFrame way of accessing data, in R: 
>> > ds$day[i] 
>> > if mod(thisDay, DPY) != 0 
>> > continue 
>> > end 
>> > if thisDay < DPY 
>> > continue 
>> > end 
>> > thisFm = ds[i, :fm] 
>> > dataSubset = @where(ds, (:fm .== thisFm) & (:day .>= (thisDay - 
>> NWINDOW)) & 
>> > (:day .<= (thisDay - 1))) 
>> > olsReg = fit(LinearModel, xr ~ xm, dataSubset) ## OLS from package GLM 
>> > b_ols[i] = coef(olsReg)[2] ## returns the OLS coefficients 
>> > sd_ols[i] = stderr(olsReg)[2] ## returns the OLS coefficients' standard 
>> > error 
>> > print(i, " ") 
>> > end 
>> > 
>> > ds[:b_ols] = b_ols 
>> > ds[:sd_ols] = sd_ols 
>> > 
>> > print("\nOLS Beta Regressions are Done\n") 
>> > 
>> > ds = join(ds, by(ds, :day) do ds 
>> >     DataFrame(xsect_mean = mean(dropna(ds[:b_ols])), xsect_sd = 
>> > std(dropna(ds[:b_ols]))) 
>> > end, on = [:day], kind = :inner) 
>> > ds = sort!(ds) 
>> > 
>> > print("Cross-Sectional OLS Statistics are Done\n") 
>> > 
>> > ds[:w_ols] = @with(ds, :xsect_sd.^2 ./ (:sd_ols.^2 + :xsect_sd.^2)) 
>> > ds[:b_vck] = @with(ds, round(:w_ols .* :b_ols + (1 - :w_ols) .* 
>> :xsect_mean, 
>> > 4)) 
>> > ds[:b_ols] = @with(ds, round(:b_ols, 4)) 
>> > 
>> > print("OLS and VCK are Done.  Now Writing Output.\n") 
>> > 
>> > toc() 
>> > 
>> > The running time is around 15 seconds for Julia. 
>>
>> Not really familiar with DataFrame etc, the most obvious thing is 
>>
>> http://julia.readthedocs.org/en/latest/manual/performance-tips/#avoid-global-variables
>>  
>> . 
>>
>> > 
>> > I tried C++, too. Having the same output with R and Julia, C++ only 
>> used 
>> > 0.23 seconds. Can someone tell me why this is happening? 
>> > 
>> > 
>> > 
>> > 
>> > 
>>
>

Reply via email to