I spent some more time looking in to this. I think the problem here might be more than just the value of norm that gets passed to createTermFrequencyVectors methods. I will update this thread I can root cause this issue. I would appreciate if someone has any pointers/suggestions about what might be the issue here.
-Shrinivas -----Original Message----- From: Ted Dunning [mailto:[email protected]] Sent: Wednesday, July 11, 2012 7:02 PM To: [email protected] Subject: Re: Potential regression in ASFEmail KMeans clustering Speaking without actually looking into this, I would say that a -1 norm doesn't make good sense. If the default value of the norm exponent changed to -1, it would wreak various kinds of havoc and would be a Bad Thing(tm). However, regardless of that, since I haven't looked at the code in question my comment has a kind of low chance of being on target. It is just that what you said does sound very plausible. On Wed, Jul 11, 2012 at 3:14 PM, Joshi, Shrinivas <[email protected]>wrote: > ... Basically, with the current trunk and with the norm value of -1.0f > (which is what gets passed to > DictionaryVectorizer.createTermFrequencyVectors method in case > processIdf Boolean is true) I see no difference in the size of > tf-vectors and tf-idf vectors. ... > > If I pass norm value of 2.0f to > DictionaryVectorizer.createTermFrequencyVectors method in the current > trunk then I do not see the regression. >
