Hi Mose,

I messed around with the internals of AutoGrad quite a bit.  The latest 
master runs about twice as fast as v0.0.1 (and quite a bit faster than the 
original Python autograd package).  The current performance is ok in 
array-heavy code (such as training deep learning models where most of the 
heavy lifting is done by array ops on the gpu):

julia> sumsin(x)=sum(sin(x))
sumsin (generic function with 1 method)

julia> COS=grad(sumsin)
gradfun (generic function with 1 method)

julia> x=rand(1000,1000);

julia> @benchmark cos(x)
BenchmarkTools.Trial: 
  samples:          422
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  7.63 mb
  allocs estimate:  2
  minimum time:     10.41 ms (0.00% GC)
  median time:      12.01 ms (0.00% GC)
  mean time:        11.57 ms (0.00% GC)
  maximum time:     13.00 ms (0.00% GC)

julia> @benchmark COS(x)
BenchmarkTools.Trial: 
  samples:          178
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  38.15 mb
  allocs estimate:  76
  minimum time:     25.89 ms (2.56% GC)
  median time:      27.51 ms (4.59% GC)
  mean time:        28.18 ms (4.21% GC)
  maximum time:     31.12 ms (4.09% GC)



However no amount of optimization in the current design, i.e. recording 
every primitive operation, then backpropagating the gradients, is going to 
close the ns vs us gap you point out below.  For fast scalar operations I 
recommend the ReverseDiffSource package.  For fast prototyping of complex 
machine learning models (especially when the operations depend on the 
input, as in sequence RNN models) I recommend AutoGrad.  I will publish 
some more detailed machine learning benchmarks soon.

best,
deniz

P.S. I am adding more of your derivatives, I'll make another release when 
they are reasonably complete.


On Monday, August 29, 2016 at 3:55:12 PM UTC+3, Mosè Giordano wrote:
>
> Hi Tom,
>
>
> Mose: what version of julia are you on? Anonymous functions and closures 
>> are much faster on 0.5... In fact there should be no performance penalty vs 
>> regular functions, which allows you to rethink your paradigm. 
>>
>
> It was Julia 0.4.6, but I get similar results also with Julia 0.6-dev.72:
>
> julia> using BenchmarkTools, AutoGrad
>
> julia> COS = grad(sin)
> (::gradfun) (generic function with 1 method)
>
> julia> @benchmark cos(0.0)
> BenchmarkTools.Trial: 
>   samples:          10000
>   evals/sample:     1000
>   time tolerance:   5.00%
>   memory tolerance: 1.00%
>   memory estimate:  0.00 bytes
>   allocs estimate:  0
>   minimum time:     6.00 ns (0.00% GC)
>   median time:      6.00 ns (0.00% GC)
>   mean time:        6.03 ns (0.00% GC)
>   maximum time:     19.00 ns (0.00% GC)
>
> julia> @benchmark COS(0.0)
> BenchmarkTools.Trial: 
>   samples:          10000
>   evals/sample:     1
>   time tolerance:   5.00%
>   memory tolerance: 1.00%
>   memory estimate:  4.05 kb
>   allocs estimate:  71
>   minimum time:     23.07 μs (0.00% GC)
>   median time:      24.27 μs (0.00% GC)
>   mean time:        25.36 μs (1.63% GC)
>   maximum time:     4.23 ms (97.76% GC)
>
> Bye,
> Mosè
>

Reply via email to