I spent some more time looking in to this. I think the problem here might be 
more than just the value of norm that gets passed to createTermFrequencyVectors 
methods. I will update this thread I can root cause this issue. I would 
appreciate if someone has any pointers/suggestions about what might be the 
issue here.

-Shrinivas

-----Original Message-----
From: Ted Dunning [mailto:[email protected]] 
Sent: Wednesday, July 11, 2012 7:02 PM
To: [email protected]
Subject: Re: Potential regression in ASFEmail KMeans clustering

Speaking without actually looking into this, I would say that a -1 norm doesn't 
make good sense.  If the default value of the norm exponent changed to -1, it 
would wreak various kinds of havoc and would be a Bad Thing(tm).

However, regardless of that, since I haven't looked at the code in question my 
comment has a kind of low chance of being on target.  It is just that what you 
said does sound very plausible.

On Wed, Jul 11, 2012 at 3:14 PM, Joshi, Shrinivas
<[email protected]>wrote:

> ... Basically, with the current trunk and with the norm value of -1.0f 
> (which is what gets passed to 
> DictionaryVectorizer.createTermFrequencyVectors method in case 
> processIdf Boolean is true) I see no difference in the size of 
> tf-vectors and tf-idf vectors. ...
>
> If I pass norm value of 2.0f to
> DictionaryVectorizer.createTermFrequencyVectors method in the current 
> trunk then I do not see the regression.
>

Reply via email to