Regularization is, as Jake says, quite separate from normalization. Normalization is scaling something so some kind of norm of the scaled something is 1.
Regularization is a way of trading off complexity of a model against accuracy of a fit to the training data with the goal of having better performance on unseen data. For large scale datamining, regularization is absolutely required because the models used are potentially complex enough to act like a lookup table. If you restrict the model complexity, then you don't need so much training data, will achieve lower than best possible performance and won't need regularization quite so urgently. Regularization can also be viewed from a large number of different angles that don't sound anything like my definition above. Some examples of regularization in practice include: - limiting the number of variables you consider - limiting the number of singular values you consider - using weight decay in neural networks - using small learning rates that decay to near zero - limiting the number of clusters you use The wikipedia article on the subject is terse, but pretty good: http://en.wikipedia.org/wiki/Regularization_(mathematics) On Wed, Jun 9, 2010 at 12:34 PM, Richard Simon Just < [email protected]> wrote: > On 09/06/10 00:47, Ted Dunning wrote: >>> @Jake and Sean >>> My understanding is that the adding of biases and average rating to the >>> prediction is based on what is done in terms of normalisation before the >>> SVD >>> computation. On that topic could someone clarify the difference between >>> normalization and regularization for me? and also where/if the two >>> interact? >>> >>> >> I'm not sure what kind of regularization we're doing here, actually... >> >> >> >> > I guess what I'm asking is, would regularization normally be a part of the > normalization process? or are they completely separate? In the literature > when normalization is talked about they generally seem to be talking about > imputation and the filling in of the null entries. Whereas when > regularization is mentioned it's more gradient descent.
