[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265538#comment-15265538 ] Behroz Sikander commented on HAMA-941: -- Clusters seem to be generating correctly now. Yes, the factor should be user defined parameter. I can change that. I will check the code again for other problems. btw, how do you create a patch and paste in a Jira comment ? > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265535#comment-15265535 ] Behroz Sikander commented on HAMA-941: -- Thank you. I will try the patch and will let you know if I find something else. > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265533#comment-15265533 ] Edward J. Yoon commented on HAMA-941: - P.S., Initial code can be found at HAMA-594. and, I changed few things because it doesn't work correctly. > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265532#comment-15265532 ] Edward J. Yoon commented on HAMA-941: - First of all, it looks like boundary score factor seems always 0.0. This is the user-defined parameter. 2nd, if vertex count is (vC <= 1), score should be 1.0. Please apply my patch and test again. Do you see more bugs? {code} diff --git a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java index 9a905c1..38481fd 100644 --- a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java +++ b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java @@ -71,7 +71,7 @@ candidates.add(msg); if (!msg.contains(this.getVertexID()) -&& msg.size() == semiClusterMaximumVertexCount) { +&& msg.size() < semiClusterMaximumVertexCount) { SemiClusterMessage msgNew = WritableUtils.clone(msg, this.getConf()); msgNew.addVertex(this); msgNew.setSemiClusterId("C" @@ -149,14 +149,15 @@ * @return the value to calcualte the Score of a semi-cluster. */ public double semiClusterScoreCalcuation(SemiClusterMessage message) { -double iC = 0.0, bC = 0.0, fB = 0.0, sC = 0.0; -int vC = 0, eC = 0; +// TODO fB is the bounday score factor. This should be configurable by user +// the default is 0.5 +double iC = 0.0, bC = 0.0, fB = 0.5, sC = 0.0; +int vC = 0; vC = message.size(); for (Vertexv : message .getVertexList()) { List > eL = v.getEdges(); for (Edge e : eL) { -eC++; if (message.contains(e.getDestinationVertexID()) && e.getValue() != null) { iC = iC + e.getValue().get(); @@ -165,8 +166,12 @@ } } } + if (vC > 1) - sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2)) / eC; + sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2)); +else + sC = 1.0; + return sC; } {code} > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265329#comment-15265329 ] Behroz Sikander commented on HAMA-941: -- According to Giraph implementation, our condition is wrong and should something like the following if (!contains && cluster.vertices.size() < clusterCapacity) { } > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265326#comment-15265326 ] Behroz Sikander commented on HAMA-941: -- In the past few days, I went through the material on semi-clustering. I have developed basic understanding of the algorithm but there is no real example implementation available online. Then I studied the code of semi-clustering implemented in Hama. I do not think that semi-clustering algorithm is working properly. The score of a semi-cluster remains 1.0 and it never changes (this 1.0 value is defined in the constructor of SemiClusterDetails class). Further, the most strange thing is that the code (semiClusterScoreCalcuation) to actually calculate the Score never fires !. In the "compute" event of "SemiClusteringVertex" class, the following condition is never satisfied and since the condition is never satisfied, the score is never calculated. if (!msg.contains(this.getVertexID()) && msg.size() == semiClusterMaximumVertexCount) { . msgNew.setScore(semiClusterScoreCalcuation(msgNew)); . } How should we proceed ? I can look into the Giraph implementation of Semi-Clustering and can try to find out what is the problem with semi-clustering. (https://github.com/grafos-ml/okapi/blob/master/src/main/java/ml/grafos/okapi/graphs/SemiClustering.java) > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)