Tests are passing fine. But Not when testing reuters. On Wed, Feb 17, 2010 at 8:07 PM, Pallavi Palleti < pallavi.pall...@corp.aol.com> wrote:
> If we just need to verify with some sample dataset, we already have the > data in TestFuzzyKMeansClustering code. won't that suffice? Otherwise, I > need to manually generate some sample dataset as I don't have this small > dataset with me. I am actually running on movielens data using movie ratings > as vector (movie as dimension , rating as coefficient) and user as point. > > > Thanks > Pallavi > > Robin Anil wrote: > >> I tracked the versions back to before the change to Writables were done. >> There is nothing significant change in the code. >> >> Can you give me a small dataset 10 points maybe 5 dimensions. I can verify >> the trunk in Case? >> >> Robin >> >> On Wed, Feb 17, 2010 at 7:49 PM, Pallavi Palleti < >> pallavi.pall...@corp.aol.com> wrote: >> >> >> >>> I have a local version which I have submitted long back and I am using it >>> on real data and is not giving same point for all clusters. However, I >>> haven't tried with latest mahout code. I have kept my code to output data >>> as >>> text so that it is easy for me to verify. However, current mahout code >>> outputs it as binary data (as sequencefile). So, it is difficult to >>> verify. >>> >>> >>> Thanks >>> Pallavi >>> >>> Robin Anil wrote: >>> >>> >>> >>>> Have you verified the trunk code on some real data. I am getting same >>>> point >>>> for all clusters regardless of the distnce measure >>>> >>>> Robin >>>> >>>> >>>> >>>> On Wed, Feb 17, 2010 at 6:41 PM, Pallavi Palleti < >>>> pallavi.pall...@corp.aol.com> wrote: >>>> >>>> >>>> >>>> >>>> >>>>> Yes. It shouldn't be a problem. My point was that we are extending >>>>> numpoints as part of ClusterBase, though we are not using it in >>>>> SoftCluster. >>>>> Other that that, I don't see any issue w.r.t. functionality. >>>>> >>>>> >>>>> Thanks >>>>> Pallavi >>>>> >>>>> Robin Anil wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> In the impl of SoftClusters on writeOut it calculates the centroid and >>>>>> writes it and when read(in) it reads the centroid in to the center. >>>>>> >>>>>> In ClusterDumper it reads into the ClusterBase and does >>>>>> value.getCenter(); >>>>>> It should work normally right >>>>>> >>>>>> Robin >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Feb 17, 2010 at 6:02 PM, Pallavi Palleti < >>>>>> pallavi.pall...@corp.aol.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Yes. But not the total number of points. So, the numpoints from >>>>>>> ClusterBase >>>>>>> will not be used in SoftCluster. numpoints is specific to Kmeans >>>>>>> similar >>>>>>> to >>>>>>> weightedpoint total for fuzzy kmeans. >>>>>>> >>>>>>> >>>>>>> Robin Anil wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> the center is still the averaged out centroid right? >>>>>>>> weightedtotalvector/totalprobWeight >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Feb 17, 2010 at 5:10 PM, Pallavi Palleti < >>>>>>>> pallavi.pall...@corp.aol.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I haven't yet gone thru ClusterDumper. However, ClusterBase would >>>>>>>>> be >>>>>>>>> having >>>>>>>>> number of points to average out (pointTotal/numPoints as per >>>>>>>>> kmeans) >>>>>>>>> where >>>>>>>>> as SoftCluster will have weighted point total. So, I am wondering >>>>>>>>> how >>>>>>>>> can >>>>>>>>> we >>>>>>>>> reuse ClusterBase here? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Pallavi >>>>>>>>> >>>>>>>>> Robin Anil wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> yes. So that cluster dumper can print it out. >>>>>>>>>> >>>>>>>>>> On Wed, Feb 17, 2010 at 5:02 PM, Pallavi Palleti < >>>>>>>>>> pallavi.pall...@corp.aol.com> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Hi Robin, >>>>>>>>>>> >>>>>>>>>>> when you meant by reusing ClusterBase, are you planning to extend >>>>>>>>>>> ClusterBase in SoftCluster? For example, SoftCluster extends >>>>>>>>>>> ClusterBase? >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Pallavi >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Robin Anil wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I have been trying to convert FuzzyKMeans SoftCluster(which >>>>>>>>>>>> should >>>>>>>>>>>> be >>>>>>>>>>>> ideally be named FuzzyKmeansCluster) to use the ClusterBase. >>>>>>>>>>>> >>>>>>>>>>>> I am getting* the same center* for all the clusters. To aid the >>>>>>>>>>>> conversion >>>>>>>>>>>> all i did was remove the center vector from the SoftCluster >>>>>>>>>>>> class >>>>>>>>>>>> and >>>>>>>>>>>> reuse >>>>>>>>>>>> the same from the ClusterBase. These are essentially making no >>>>>>>>>>>> change >>>>>>>>>>>> in >>>>>>>>>>>> the >>>>>>>>>>>> tests which passes correctly. >>>>>>>>>>>> >>>>>>>>>>>> So I am questioning whether the implementation keeps the average >>>>>>>>>>>> center >>>>>>>>>>>> at >>>>>>>>>>>> all ? Anyone who has used FuzzyKMeans experiencing this? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Robin >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>> >>> >> >> >