And the outputs of Mapper, Combiner and Reducer Robin
On Wed, Feb 17, 2010 at 7:58 PM, Robin Anil <robin.a...@gmail.com> wrote: > I tracked the versions back to before the change to Writables were done. > There is nothing significant change in the code. > > Can you give me a small dataset 10 points maybe 5 dimensions. I can verify > the trunk in Case? > > Robin > > On Wed, Feb 17, 2010 at 7:49 PM, Pallavi Palleti < > pallavi.pall...@corp.aol.com> wrote: > >> I have a local version which I have submitted long back and I am using it >> on real data and is not giving same point for all clusters. However, I >> haven't tried with latest mahout code. I have kept my code to output data as >> text so that it is easy for me to verify. However, current mahout code >> outputs it as binary data (as sequencefile). So, it is difficult to verify. >> >> >> Thanks >> Pallavi >> >> Robin Anil wrote: >> >>> Have you verified the trunk code on some real data. I am getting same >>> point >>> for all clusters regardless of the distnce measure >>> >>> Robin >>> >>> >>> >>> On Wed, Feb 17, 2010 at 6:41 PM, Pallavi Palleti < >>> pallavi.pall...@corp.aol.com> wrote: >>> >>> >>> >>>> Yes. It shouldn't be a problem. My point was that we are extending >>>> numpoints as part of ClusterBase, though we are not using it in >>>> SoftCluster. >>>> Other that that, I don't see any issue w.r.t. functionality. >>>> >>>> >>>> Thanks >>>> Pallavi >>>> >>>> Robin Anil wrote: >>>> >>>> >>>> >>>>> In the impl of SoftClusters on writeOut it calculates the centroid and >>>>> writes it and when read(in) it reads the centroid in to the center. >>>>> >>>>> In ClusterDumper it reads into the ClusterBase and does >>>>> value.getCenter(); >>>>> It should work normally right >>>>> >>>>> Robin >>>>> >>>>> >>>>> >>>>> On Wed, Feb 17, 2010 at 6:02 PM, Pallavi Palleti < >>>>> pallavi.pall...@corp.aol.com> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Yes. But not the total number of points. So, the numpoints from >>>>>> ClusterBase >>>>>> will not be used in SoftCluster. numpoints is specific to Kmeans >>>>>> similar >>>>>> to >>>>>> weightedpoint total for fuzzy kmeans. >>>>>> >>>>>> >>>>>> Robin Anil wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> the center is still the averaged out centroid right? >>>>>>> weightedtotalvector/totalprobWeight >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 17, 2010 at 5:10 PM, Pallavi Palleti < >>>>>>> pallavi.pall...@corp.aol.com> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I haven't yet gone thru ClusterDumper. However, ClusterBase would be >>>>>>>> having >>>>>>>> number of points to average out (pointTotal/numPoints as per kmeans) >>>>>>>> where >>>>>>>> as SoftCluster will have weighted point total. So, I am wondering >>>>>>>> how >>>>>>>> can >>>>>>>> we >>>>>>>> reuse ClusterBase here? >>>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> Pallavi >>>>>>>> >>>>>>>> Robin Anil wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> yes. So that cluster dumper can print it out. >>>>>>>>> >>>>>>>>> On Wed, Feb 17, 2010 at 5:02 PM, Pallavi Palleti < >>>>>>>>> pallavi.pall...@corp.aol.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi Robin, >>>>>>>>>> >>>>>>>>>> when you meant by reusing ClusterBase, are you planning to extend >>>>>>>>>> ClusterBase in SoftCluster? For example, SoftCluster extends >>>>>>>>>> ClusterBase? >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Pallavi >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Robin Anil wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I have been trying to convert FuzzyKMeans SoftCluster(which >>>>>>>>>>> should >>>>>>>>>>> be >>>>>>>>>>> ideally be named FuzzyKmeansCluster) to use the ClusterBase. >>>>>>>>>>> >>>>>>>>>>> I am getting* the same center* for all the clusters. To aid the >>>>>>>>>>> conversion >>>>>>>>>>> all i did was remove the center vector from the SoftCluster class >>>>>>>>>>> and >>>>>>>>>>> reuse >>>>>>>>>>> the same from the ClusterBase. These are essentially making no >>>>>>>>>>> change >>>>>>>>>>> in >>>>>>>>>>> the >>>>>>>>>>> tests which passes correctly. >>>>>>>>>>> >>>>>>>>>>> So I am questioning whether the implementation keeps the average >>>>>>>>>>> center >>>>>>>>>>> at >>>>>>>>>>> all ? Anyone who has used FuzzyKMeans experiencing this? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Robin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>> >>>> >>> >>> >> >