If we just need to verify with some sample dataset, we already have the data in TestFuzzyKMeansClustering code. won't that suffice? Otherwise, I need to manually generate some sample dataset as I don't have this small dataset with me. I am actually running on movielens data using movie ratings as vector (movie as dimension , rating as coefficient) and user as point.

Thanks
Pallavi

Robin Anil wrote:
I tracked the versions back to before the change to Writables were done.
There is nothing significant change in the code.

Can you give me a small dataset 10 points maybe 5 dimensions. I can verify
the trunk in Case?

Robin

On Wed, Feb 17, 2010 at 7:49 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:

I have a local version which I have submitted long back and I am using it
on real data and is not giving same point for all clusters.  However, I
haven't tried with latest mahout code. I have kept my code to output data as
text so that it is easy for me to verify. However, current mahout code
outputs it as binary data (as sequencefile). So, it is difficult to verify.


Thanks
Pallavi

Robin Anil wrote:

Have you verified the trunk code on some real data. I am getting same
point
for all clusters regardless of the distnce measure

Robin



On Wed, Feb 17, 2010 at 6:41 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:



Yes. It shouldn't be a problem. My point was that we are extending
numpoints as part of ClusterBase, though we are not using it in
SoftCluster.
Other that that, I don't see any issue w.r.t. functionality.


Thanks
Pallavi

Robin Anil wrote:



In the impl of SoftClusters on writeOut it calculates the centroid and
writes it and when read(in) it reads the centroid in to the center.

In ClusterDumper it reads into the ClusterBase and does
value.getCenter();
It should work normally right

Robin



On Wed, Feb 17, 2010 at 6:02 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:





Yes. But not the total number of points. So, the numpoints from
ClusterBase
will not be used in SoftCluster. numpoints is specific to Kmeans
similar
to
weightedpoint total for fuzzy kmeans.


Robin Anil wrote:





the center is still the averaged out centroid right?
weightedtotalvector/totalprobWeight



On Wed, Feb 17, 2010 at 5:10 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:







I haven't yet gone thru ClusterDumper. However, ClusterBase would be
having
number of points to average out (pointTotal/numPoints as per kmeans)
where
as SoftCluster will have weighted point total. So, I am wondering how
can
we
reuse ClusterBase here?


Thanks
Pallavi

Robin Anil wrote:







yes. So that cluster dumper can print it out.

On Wed, Feb 17, 2010 at 5:02 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:









Hi Robin,

when you meant by reusing ClusterBase, are you planning to extend
ClusterBase in SoftCluster? For example, SoftCluster extends
ClusterBase?

Thanks
Pallavi


Robin Anil wrote:









I have been trying to convert FuzzyKMeans SoftCluster(which should
be
ideally be named FuzzyKmeansCluster) to use the ClusterBase.

I am getting* the same center* for all the clusters. To aid the
conversion
all i did was remove the center vector from the SoftCluster class
and
reuse
the same from the ClusterBase. These are essentially making no
change
in
the
tests which passes correctly.

So I am questioning whether the implementation keeps the average
center
at
all ? Anyone who has used FuzzyKMeans experiencing this?


Robin












Reply via email to