Re: Standard Deviation of a Set of Vectors

Jeff Eastman Wed, 29 Sep 2010 05:49:19 -0700

 Hi Derek,

That makes sense. With the very, very tight cluster that your clusteringproduced you've uncovered an instability in that std calculation. I'mgoing to rework that method today to use a better algorithm and will adda small prior in the process. I'm also going to add a unit test toreproduce this problem first. Look for a commit in a couple of hours.




On 9/29/10 8:02 AM, Derek O'Callaghan wrote:

Hi Jeff,
FYI I checked the problem I was having in CDbwEvaluator with the samedataset from the ClusterEvaluator thread, the problem is occurring inthe std calculation in CDbwEvaluator.computeStd(), in thats2.times(s0).minus(s1.times(s1)) generates negative values which thenproduce NaN with the subsequent SquareRootFunction(). This then setsthe average std to NaN later on in intraClusterDensity(). It'shappening for the cluster I have with the almost-identical points.
It's the same symptom as the problem last week, where this washappening when s0 was 1. Is the solution to ignore these clusters,like the s0 = 1 clusters? Or to add a small prior std as was done forthe similar issue in NormalModel.pdf()?
Thanks,

Derek

On 28/09/10 20:28, Jeff Eastman wrote:
 Hi Ted,
The clustering code computes this value for cluster radius.Currently, it is done with a running sums approach (s^0, s^1, s^2)that computes the std of each vector term using:
Vector std = s2.times(s0).minus(s1.times(s1)).assign(newSquareRootFunction()).divide(s0);
For CDbw, they need a scalar, average std value, and this iscurrently computed by averaging the vector terms:
double d = std.zSum() / std.size();
The more I read about it; however, the less confident I am about thisapproach. The paper itself seems to indicate a covariance approach,but I am lost in their notation. See page 5, just above Definition 1.
www.db-net.aueb.gr/index.php/corporate/content/download/227/833/file/HV_poster2002.pdf

Re: Standard Deviation of a Set of Vectors

Reply via email to