In what way do we summarize gradients? Cause if we just summarize them as vectors we won't get direction of cost function minimization. It seems to be near to true only if we choose really little learning step. But in this case we'll lose all the speedup from mapred. To see that sum of partial gradients doesn't give right direction just have a look at the parabaloid z=x^2+y^2 , take 2 symmetrical points on it (1, 1) and (-1, -1) for example and summary gradient will always be 0...

Reply via email to