And the outputs of Mapper, Combiner and Reducer

Robin


On Wed, Feb 17, 2010 at 7:58 PM, Robin Anil <robin.a...@gmail.com> wrote:

> I tracked the versions back to before the change to Writables were done.
> There is nothing significant change in the code.
>
> Can you give me a small dataset 10 points maybe 5 dimensions. I can verify
> the trunk in Case?
>
> Robin
>
> On Wed, Feb 17, 2010 at 7:49 PM, Pallavi Palleti <
> pallavi.pall...@corp.aol.com> wrote:
>
>> I have a local version which I have submitted long back and I am using it
>> on real data and is not giving same point for all clusters.  However, I
>> haven't tried with latest mahout code. I have kept my code to output data as
>> text so that it is easy for me to verify. However, current mahout code
>> outputs it as binary data (as sequencefile). So, it is difficult to verify.
>>
>>
>> Thanks
>> Pallavi
>>
>> Robin Anil wrote:
>>
>>> Have you verified the trunk code on some real data. I am getting same
>>> point
>>> for all clusters regardless of the distnce measure
>>>
>>> Robin
>>>
>>>
>>>
>>> On Wed, Feb 17, 2010 at 6:41 PM, Pallavi Palleti <
>>> pallavi.pall...@corp.aol.com> wrote:
>>>
>>>
>>>
>>>> Yes. It shouldn't be a problem. My point was that we are extending
>>>> numpoints as part of ClusterBase, though we are not using it in
>>>> SoftCluster.
>>>> Other that that, I don't see any issue w.r.t. functionality.
>>>>
>>>>
>>>> Thanks
>>>> Pallavi
>>>>
>>>> Robin Anil wrote:
>>>>
>>>>
>>>>
>>>>> In the impl of SoftClusters on writeOut it calculates the centroid and
>>>>> writes it and when read(in) it reads the centroid in to the center.
>>>>>
>>>>> In ClusterDumper it reads into the ClusterBase and does
>>>>> value.getCenter();
>>>>> It should work normally right
>>>>>
>>>>> Robin
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 17, 2010 at 6:02 PM, Pallavi Palleti <
>>>>> pallavi.pall...@corp.aol.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Yes. But not the total number of points. So, the numpoints from
>>>>>> ClusterBase
>>>>>> will not be used in SoftCluster. numpoints is specific to Kmeans
>>>>>> similar
>>>>>> to
>>>>>> weightedpoint total for fuzzy kmeans.
>>>>>>
>>>>>>
>>>>>> Robin Anil wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> the center is still the averaged out centroid right?
>>>>>>> weightedtotalvector/totalprobWeight
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 17, 2010 at 5:10 PM, Pallavi Palleti <
>>>>>>> pallavi.pall...@corp.aol.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I haven't yet gone thru ClusterDumper. However, ClusterBase would be
>>>>>>>> having
>>>>>>>> number of points to average out (pointTotal/numPoints as per kmeans)
>>>>>>>> where
>>>>>>>> as SoftCluster will have weighted point total. So, I am wondering
>>>>>>>> how
>>>>>>>> can
>>>>>>>> we
>>>>>>>> reuse ClusterBase here?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Pallavi
>>>>>>>>
>>>>>>>> Robin Anil wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> yes. So that cluster dumper can print it out.
>>>>>>>>>
>>>>>>>>> On Wed, Feb 17, 2010 at 5:02 PM, Pallavi Palleti <
>>>>>>>>> pallavi.pall...@corp.aol.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Hi Robin,
>>>>>>>>>>
>>>>>>>>>> when you meant by reusing ClusterBase, are you planning to extend
>>>>>>>>>> ClusterBase in SoftCluster? For example, SoftCluster extends
>>>>>>>>>> ClusterBase?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Pallavi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Robin Anil wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I have been trying to convert FuzzyKMeans SoftCluster(which
>>>>>>>>>>> should
>>>>>>>>>>> be
>>>>>>>>>>> ideally be named FuzzyKmeansCluster) to use the ClusterBase.
>>>>>>>>>>>
>>>>>>>>>>> I am getting* the same center* for all the clusters. To aid the
>>>>>>>>>>> conversion
>>>>>>>>>>> all i did was remove the center vector from the SoftCluster class
>>>>>>>>>>> and
>>>>>>>>>>> reuse
>>>>>>>>>>> the same from the ClusterBase. These are essentially making no
>>>>>>>>>>> change
>>>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>> tests which passes correctly.
>>>>>>>>>>>
>>>>>>>>>>> So I am questioning whether the implementation keeps the average
>>>>>>>>>>> center
>>>>>>>>>>> at
>>>>>>>>>>> all ? Anyone who has used FuzzyKMeans experiencing this?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Robin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Reply via email to