[ 
https://issues.apache.org/jira/browse/MAHOUT-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Eastman updated MAHOUT-845:
--------------------------------

    Fix Version/s:     (was: 0.6)
                   0.7

I downloaded the latest patch and it no longer applies without errors. Given 
the late date w.r.t. 0.6 code freeze and the lack of an assignee I'm moving the 
issue to release 0.7
                
> Make cluster top terms code more reusable
> -----------------------------------------
>
>                 Key: MAHOUT-845
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-845
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Frank Scholten
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-845.patch, MAHOUT-845.patch, MAHOUT-845.patch
>
>
> When working with Mahout text clustering I find that I keep writing code 
> similar to the contents of
> public static String getTopFeatures(Cluster cluster, String[] dictionary, int 
> numTerms)
> in ClusterDumper in order to determine cluster labels.
> I think it would be useful if (parts of) this code are added to the cluster 
> or vector API so that you could do something like
> Cluster cluster = ... // get the cluster from seq file iterable
> String clusterLabel = cluster.getTopTerms(1, dictionary); // Do something 
> with the label  
> I think this would make it easier to export and post-process clustering 
> results, like indexing or storing them elsewhere.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to