In what way do we summarize gradients? Cause if we just summarize them
as vectors we won't get direction of cost function minimization. It
seems to be near to true only if we choose really little learning step.
But in this case we'll lose all the speedup from mapred. To see that sum
of partial gradients doesn't give right direction just have a look at
the parabaloid z=x^2+y^2 , take 2 symmetrical points on it (1, 1) and
(-1, -1) for example and summary gradient will always be 0...
- NN implementation question Sergey Chickin
- Re: NN implementation question Ted Dunning
- Re: NN implementation question Sergey Chickin
- Re: NN implementation question Ted Dunning
- Re: NN implementation question Sergey Chickin