Tests are passing fine. But Not when testing reuters.

On Wed, Feb 17, 2010 at 8:07 PM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:

> If we just need to verify with some sample dataset, we already have the
> data in TestFuzzyKMeansClustering code. won't that suffice? Otherwise, I
> need to manually generate some sample dataset as I don't have this small
> dataset with me. I am actually running on movielens data using movie ratings
> as vector (movie as dimension , rating as coefficient) and user as point.
>
>
> Thanks
> Pallavi
>
> Robin Anil wrote:
>
>> I tracked the versions back to before the change to Writables were done.
>> There is nothing significant change in the code.
>>
>> Can you give me a small dataset 10 points maybe 5 dimensions. I can verify
>> the trunk in Case?
>>
>> Robin
>>
>> On Wed, Feb 17, 2010 at 7:49 PM, Pallavi Palleti <
>> pallavi.pall...@corp.aol.com> wrote:
>>
>>
>>
>>> I have a local version which I have submitted long back and I am using it
>>> on real data and is not giving same point for all clusters.  However, I
>>> haven't tried with latest mahout code. I have kept my code to output data
>>> as
>>> text so that it is easy for me to verify. However, current mahout code
>>> outputs it as binary data (as sequencefile). So, it is difficult to
>>> verify.
>>>
>>>
>>> Thanks
>>> Pallavi
>>>
>>> Robin Anil wrote:
>>>
>>>
>>>
>>>> Have you verified the trunk code on some real data. I am getting same
>>>> point
>>>> for all clusters regardless of the distnce measure
>>>>
>>>> Robin
>>>>
>>>>
>>>>
>>>> On Wed, Feb 17, 2010 at 6:41 PM, Pallavi Palleti <
>>>> pallavi.pall...@corp.aol.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Yes. It shouldn't be a problem. My point was that we are extending
>>>>> numpoints as part of ClusterBase, though we are not using it in
>>>>> SoftCluster.
>>>>> Other that that, I don't see any issue w.r.t. functionality.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Pallavi
>>>>>
>>>>> Robin Anil wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> In the impl of SoftClusters on writeOut it calculates the centroid and
>>>>>> writes it and when read(in) it reads the centroid in to the center.
>>>>>>
>>>>>> In ClusterDumper it reads into the ClusterBase and does
>>>>>> value.getCenter();
>>>>>> It should work normally right
>>>>>>
>>>>>> Robin
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 17, 2010 at 6:02 PM, Pallavi Palleti <
>>>>>> pallavi.pall...@corp.aol.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Yes. But not the total number of points. So, the numpoints from
>>>>>>> ClusterBase
>>>>>>> will not be used in SoftCluster. numpoints is specific to Kmeans
>>>>>>> similar
>>>>>>> to
>>>>>>> weightedpoint total for fuzzy kmeans.
>>>>>>>
>>>>>>>
>>>>>>> Robin Anil wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> the center is still the averaged out centroid right?
>>>>>>>> weightedtotalvector/totalprobWeight
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 17, 2010 at 5:10 PM, Pallavi Palleti <
>>>>>>>> pallavi.pall...@corp.aol.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> I haven't yet gone thru ClusterDumper. However, ClusterBase would
>>>>>>>>> be
>>>>>>>>> having
>>>>>>>>> number of points to average out (pointTotal/numPoints as per
>>>>>>>>> kmeans)
>>>>>>>>> where
>>>>>>>>> as SoftCluster will have weighted point total. So, I am wondering
>>>>>>>>> how
>>>>>>>>> can
>>>>>>>>> we
>>>>>>>>> reuse ClusterBase here?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Pallavi
>>>>>>>>>
>>>>>>>>> Robin Anil wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> yes. So that cluster dumper can print it out.
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 17, 2010 at 5:02 PM, Pallavi Palleti <
>>>>>>>>>> pallavi.pall...@corp.aol.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hi Robin,
>>>>>>>>>>>
>>>>>>>>>>> when you meant by reusing ClusterBase, are you planning to extend
>>>>>>>>>>> ClusterBase in SoftCluster? For example, SoftCluster extends
>>>>>>>>>>> ClusterBase?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Pallavi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Robin Anil wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I have been trying to convert FuzzyKMeans SoftCluster(which
>>>>>>>>>>>> should
>>>>>>>>>>>> be
>>>>>>>>>>>> ideally be named FuzzyKmeansCluster) to use the ClusterBase.
>>>>>>>>>>>>
>>>>>>>>>>>> I am getting* the same center* for all the clusters. To aid the
>>>>>>>>>>>> conversion
>>>>>>>>>>>> all i did was remove the center vector from the SoftCluster
>>>>>>>>>>>> class
>>>>>>>>>>>> and
>>>>>>>>>>>> reuse
>>>>>>>>>>>> the same from the ClusterBase. These are essentially making no
>>>>>>>>>>>> change
>>>>>>>>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>> tests which passes correctly.
>>>>>>>>>>>>
>>>>>>>>>>>> So I am questioning whether the implementation keeps the average
>>>>>>>>>>>> center
>>>>>>>>>>>> at
>>>>>>>>>>>> all ? Anyone who has used FuzzyKMeans experiencing this?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Robin
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>
>>>
>>
>>
>

Reply via email to