Tedd, L^1/L^2 Normalization sounds like a good solution. I will try it out and report the results.
Is there any literature available comparison of these normalization techniques? Thank you. --shashi On Thu, May 28, 2009 at 12:30 PM, Ted Dunning <[email protected]> wrote: > Shashi, > > You are correct that this can be a problem, especially with vectors that > have a large number of elements that are zero, but not known to be such. > > The definition as it stands is roughly an L^0 normalization. It is more > common in clustering to use an L^1 or L^2 normalization. This would divide > the terms by, respectively, the sum of the elements or the square root of > the sum of the squares of the elements. Both L^1 and L^2 normalization > avoids the problem you mention since negligibly small elements will not > contribute significantly to the norm. > > Traditionally, L^2 norms are used with documents. This dates back to Salton > and the term-vector model of text retrieval. That practice was, however, > based on somewhat inappropriate geometric intuitions. Other norms are quite > plausibly more appropriate. For instance, if normalized term frequencies > are considered to be estimates of word generation probabilities, then the > L^1 norm is much more appropriate. > > On Wed, May 27, 2009 at 11:52 PM, Shashikant Kore <[email protected]>wrote: > >> ... >> My concern in the following code is that the total is divided by >> numPoints. For a term, only few of the numPoints vectors have >> contributed towards the weight. Rest had the value set to zero. That >> drags down the average and it much more pronounced in a large set of >> sparse vectors. >> >> >
