Re: Bug in Gradient Machine?

Ted Dunning Mon, 06 Feb 2012 11:46:44 -0800

Can you file a patch on the bug system?

See https://issues.apache.org/jira/browse/MAHOUT


I don't know if you have done this before, but you simply create an issue,
cut and paste your description.  Then change the file in question and do a
diff to get a patch.  Attach that patch to the issue that you filed so that
we can review your specific changes.

This helps make sure that the change being discussed is known precisely.

On Mon, Feb 6, 2012 at 6:38 AM, Herta, Christian <
[email protected]> wrote:

> Hello,
>
>
> yesterday I checked the code of the gradient machine to understand what's
> going
> on there. I think I found a bug in the computation of the gradient (trunk):
>
> In the commentit's written: "dy / dw is just w since  y = x' * w + b."
>
> This is wrong. dy/dw_ is x (ignoring the indices). The same is done in the
> code.
> See the corrected version below.
>
>
> ----
>
> The gradient machine is a specialized version of a multi layer perceptron
> (MLP).
> In a MLP the gradient for computing the "weight change" for the output
> units is:
>
>
>
> dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
> here: i index of the output layer; j index of the hidden layer
>
> (d stands for the partial derivatives)
>
>
> here: z_i = a_i (no squashing in the output layer)
>
>
>
> with the special loss (cost function) is  E = 1 - a_g + a_b = 1 - z_g + z_b
>
>
>
> with
>
> g index of output unit with target value: +1 (positive class)
>
> b: random output unit with target value: 0
>
>
>
> =>
>
> dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden
> unit
> j)
> dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden
> unit
> j)
>
>
> That's the same if the comment would be correct:
> dy /dw = x (x is here the activation of the hidden unit) * (-1) for
> weights to
> the output unit with target value +1.
>
>
>
> In neural network implementations it's common to compute the gradient
> numerically for a test of the implementation. This can be done by:
> dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))
>
>
>
> Cheers
>
> Christian
>
> -----------------------------------
>
>
>       // Note from the loss above the gradient dloss/dy , y being the
> label is
> -1 for good
>       // and +1 for bad.
>       // dy / dw is just x since  y = z' * w + b.
>       // Hence by the chain rule, dloss / dw_ij = dloss / dy_i * dy_i /
> dw_ij =
> -z_j (for j=g).
>       // For the regularization part, 0.5 * lambda * w' w, the gradient is
> lambda * w.
>       // dy / db = 1.
>
>       // gradient descent update of the weights to the
>       // positive (should-be) output-unit
>       Vector gradGood = hiddenActivations.clone();
>       gradGood.assign(Functions.NEGATE);
>       gradGood.assign(Functions.mult(-learningRate * (1.0 -
> regularization)));
>       outputWeights[good].assign(gradGood, Functions.PLUS);
>       outputBias.setQuick(good, outputBias.get(good) + learningRate);
>
>       // gradient descent update of the weights to the
>       // (random) negative (should-be) output-unit
>       Vector gradBad = hiddenActivations.clone();
>       gradBad.assign(Functions.mult(-learningRate * (1.0 +
> regularization)));
>       outputWeights[bad].assign(gradBad, Functions.PLUS);
>       outputBias.setQuick(bad, outputBias.get(bad) - learningRate);
>
>       // backpropagation from output to hidden layer for
>       // computing the deltas (errors) of the hidden units
>       Vector propHidden = outputWeights[good].clone();
>       propHidden.assign(Function.NEGATE);
>       propHidden.assign(outputWeights[bad], Functions.PLUS);
>       // Gradient of sigmoid (logistic function) is s * (1 -s).
>       Vector gradSig = hiddenActivation.clone();
>       gradSig.assign(Functions.SIGMOIDGRADIENT);
>       // Multiply by the change caused by the ranking loss.
>       for (int i = 0; i < numHidden; i++) {
>         gradSig.setQuick(i, gradSig.get(i) * propHidden.get(i));
>       }
>
>       // gradSig are now the deltas (errors) of the hidden layers
>       // the weight change of w_ij should be proportional
>       // to delta_i * x_j + regularization * w_ij
>       for (int i = 0; i < numHidden; i++) {
>         for (int j = 0; j < numFeatures; j++) {
>           double v = hiddenWeights[i].get(j);
>           v -= learningRate * (gradSig.get(i) + regularization * v);
>           hiddenWeights[i].setQuick(j, v);
>         }
>       }
>
>
>
>
> Prof. Dr. Christian Herta
> HTW Berlin
> Wilhelminenhofstraße 75A,
> 12459 Berlin, Gebäude C, Raum: 613
> Email: [email protected]
> Telefon: (030) 5019-3498
> Fax: (030) 5019-483498

Re: Bug in Gradient Machine?

Reply via email to