Mauro, Sean, and Tim, thanks for your help. 

Following your suggestions, I removed keyword arguments and split the 
function to avoid conditional statements. These helped a bit. 

But I got a surprising result after replacing BLAS functions with simple 
for loops, for loops is about 1.5x faster than BLAS calls. My Julia is 
compiled on my computer with the default configuration (the versioninfo() 
is listed below). Do you think it will help to compile a Julia with a 
faster and more optimized BLAS implementation such as Intel's MKL? 

Regards, Yang Zhixuan

在 2015年2月19日星期四 UTC+8下午10:51:20,Zhixuan Yang写道:
>
> Hello everyone, 
>
> Recently I'm working on my first Julia project, a word embedding training 
> program similar to Google's word2vec <https://code.google.com/p/word2vec/> 
> (the code 
> of word2vec is indeed very high-quality, but I want to add more features, 
> so I decided to write a new one). Thanks to Julia's expressiveness, it cost 
> me less than 2 days to write the entire program. But it runs really slow, 
> about 100x slower than the C code of word2vec (the algorithm is the same). 
>  I've been trying to optimize my code for several days (adding type 
> annotations, using BLAS to do computation, eliminating memory allocations 
> ...), but it is still 30x slower than the C code. 
>
> The critical part of my program is the following function (it also 
> consumes most of the time according to the profiling result):
>
> function train_one(c :: LinearClassifier, x :: Array{Float64}, y :: Int64; 
> α :: Float64 = 0.025, input_gradient :: Union(Nothing, Array{Float64}) = 
> nothing)
>     predict!(c, x)
>     c.outputs[y] -= 1
>
>     if input_gradient != nothing
>         # input_gradient = ( c.weights * outputs' )'
>         BLAS.gemv!('N', α, c.weights, c.outputs, 1.0, input_gradient)
>     end
>
>     # c.weights -= α * x' * outputs;
>     BLAS.ger!(-α, vec(x), c.outputs, c.weights)
> end
>
> function predict!(c :: LinearClassifier, x :: Array{Float64})
>     c.outputs = vec(softmax(x * c.weights))
> end
>
> type LinearClassifier
>     k :: Int64 # number of outputs
>     n :: Int64 # number of inputs
>     weights :: Array{Float64, 2} # k * n weight matrix
>
>     outputs :: Vector{Float64}
> end
>
> And the entire program can be found here 
> <https://github.com/yangzhixuan/embed>. Could you please check my code 
> and tell me what I can do to get performance comparable to C. 
>
> Regards.
> Yang Zhixuan
>

Reply via email to