Regularization is, as Jake says, quite separate from normalization.

Normalization is scaling something so some kind of norm of the scaled
something is 1.

Regularization is a way of trading off complexity of a model against
accuracy of a fit to the training data with the goal of having better
performance on unseen data.  For large scale datamining, regularization is
absolutely required because the models used are potentially complex enough
to act like a lookup table.  If you restrict the model complexity, then you
don't need so much training data, will achieve lower than best possible
performance and won't need regularization quite so urgently.  Regularization
can also be viewed from a large number of different angles that don't sound
anything like my definition above.

Some examples of regularization in practice include:

- limiting the number of variables you consider

- limiting the number of singular values you consider

- using weight decay in neural networks

- using small learning rates that decay to near zero

- limiting the number of clusters you use

The wikipedia article on the subject is terse, but pretty good:
http://en.wikipedia.org/wiki/Regularization_(mathematics)

On Wed, Jun 9, 2010 at 12:34 PM, Richard Simon Just <
[email protected]> wrote:

> On 09/06/10 00:47, Ted Dunning wrote:
>>> @Jake and Sean
>>> My understanding is that the adding of biases and average rating to the
>>> prediction is based on what is done in terms of normalisation before the
>>> SVD
>>> computation. On that topic could someone clarify the difference between
>>> normalization and regularization for me? and also where/if the two
>>> interact?
>>>
>>>
>> I'm not sure what kind of regularization we're doing here, actually...
>>
>>
>>
>>
> I guess what I'm asking is, would regularization normally be a part of the
> normalization process? or are they completely separate? In the literature
> when normalization is talked about they generally seem to be talking about
> imputation and the filling in of the null entries. Whereas when
> regularization is mentioned it's more gradient descent.

Reply via email to