The two things that jump out at me straight away are keyword arguments and 
nullable fields. Changing those keyword arguments into optional positional 
arguments might give you a bit of a boost.

The thing that will really make a difference though (I expect) is 
`input_gradient::Union(Nothing,Array{Float64})`, that will be a massive 
performance killer and you should probably use an empty array instead (i.e. 
`input_gradient::Array{Float64}=Float64[]`). If that's not possible 
consider splitting the function into two methods, one that has an 
input_gradient argument and one that doesn't, then you can do your nothing 
checking outside the function.

On Thursday, 19 February 2015 14:51:20 UTC, Zhixuan Yang wrote:
>
> Hello everyone, 
>
> Recently I'm working on my first Julia project, a word embedding training 
> program similar to Google's word2vec <https://code.google.com/p/word2vec/> 
> (the code 
> of word2vec is indeed very high-quality, but I want to add more features, 
> so I decided to write a new one). Thanks to Julia's expressiveness, it cost 
> me less than 2 days to write the entire program. But it runs really slow, 
> about 100x slower than the C code of word2vec (the algorithm is the same). 
>  I've been trying to optimize my code for several days (adding type 
> annotations, using BLAS to do computation, eliminating memory allocations 
> ...), but it is still 30x slower than the C code. 
>
> The critical part of my program is the following function (it also 
> consumes most of the time according to the profiling result):
>
> function train_one(c :: LinearClassifier, x :: Array{Float64}, y :: Int64; 
> α :: Float64 = 0.025, input_gradient :: Union(Nothing, Array{Float64}) = 
> nothing)
>     predict!(c, x)
>     c.outputs[y] -= 1
>
>     if input_gradient != nothing
>         # input_gradient = ( c.weights * outputs' )'
>         BLAS.gemv!('N', α, c.weights, c.outputs, 1.0, input_gradient)
>     end
>
>     # c.weights -= α * x' * outputs;
>     BLAS.ger!(-α, vec(x), c.outputs, c.weights)
> end
>
> function predict!(c :: LinearClassifier, x :: Array{Float64})
>     c.outputs = vec(softmax(x * c.weights))
> end
>
> type LinearClassifier
>     k :: Int64 # number of outputs
>     n :: Int64 # number of inputs
>     weights :: Array{Float64, 2} # k * n weight matrix
>
>     outputs :: Vector{Float64}
> end
>
> And the entire program can be found here 
> <https://github.com/yangzhixuan/embed>. Could you please check my code 
> and tell me what I can do to get performance comparable to C. 
>
> Regards.
> Yang Zhixuan
>

Reply via email to