[jira] [Issue Comment Edited] (MAHOUT-940) Clusterdumper - Get rid of map based implementation

Paritosh Ranjan (Issue Comment Edited) (JIRA) Mon, 02 Apr 2012 20:54:59 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244952#comment-13244952
 ]


Paritosh Ranjan edited comment on MAHOUT-940 at 4/3/12 3:52 AM:
----------------------------------------------------------------

1) yes
2) It might be a good idea to do some testing before/after your code change. 
i.e. Running all Junit tests, and some manual testing using clusterdumper ( 
dump a cluster using new implementation which was getting OOM with the older 
implementation). It will make sure that the code is working.

Also, you can try to test quality before and after using the post processor. 
i.e. The results should be same, whether you use the map based or post 
processor based implementation.

So, to test it, do not get rid of the older coder, rather provide an option to 
use the map based/post processor based implementation. This will help in 
testing. Later it can be decided which version to keep i.e. new/both.
                
      was (Author: paritoshranjan):
    1) yes
2) It might be a good idea to do some testing before/after your code change. 
i.e. Running all Junit tests, and some manual testing using clusterdumper ( 
dump a cluster using new implementation which was getting OOM with the older 
implementation). It will make sure that the code is working.

Also, you can also try to test quality before after using the post processor. 
i.e. The results should be same, whether you use the map based or post 
processor based implementation.

So, to test it, do not get rid of the older coder, rather provide an option to 
use the map based/post processor based implementation. This will help in 
testing. Later it can be decided which version to keep i.e. new/both.
                  
> Clusterdumper - Get rid of map based implementation
> ---------------------------------------------------
>
>                 Key: MAHOUT-940
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-940
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Current implementation of ClusterDumper puts clusters and related vectors in 
> map. This generally results in OOM.
> Since ClusterOutputProcessor is availabale now. The ClusterDumper will at 
> first process the clusteredPoints, and then write down the clusters to a 
> local file. 
> The inability to properly read the clustering output due to ClusterDumper 
> facing OOM is seen too often in the mailing list. This improvement will fix 
> that problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-940) Clusterdumper - Get rid of map based implementation

Reply via email to