[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265538#comment-15265538
 ] 

Behroz Sikander commented on HAMA-941:
--

Clusters seem to be generating correctly now. Yes, the factor should be user 
defined parameter. I can change that. 

I will check the code again for other problems.

btw, how do you create a patch and paste in a Jira comment ?

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265535#comment-15265535
 ] 

Behroz Sikander commented on HAMA-941:
--

Thank you. I will try the patch and will let you know if I find something else.


> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265533#comment-15265533
 ] 

Edward J. Yoon commented on HAMA-941:
-

P.S., Initial code can be found at HAMA-594. and, I changed few things because 
it doesn't work correctly.

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265532#comment-15265532
 ] 

Edward J. Yoon commented on HAMA-941:
-

First of all, it looks like boundary score factor seems always 0.0. This is the 
user-defined parameter. 2nd, if vertex count is (vC <= 1), score should be 1.0. 
Please apply my patch and test again. Do you see more bugs? 

{code}
diff --git 
a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java 
b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
index 9a905c1..38481fd 100644
--- 
a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
+++ 
b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
@@ -71,7 +71,7 @@
 candidates.add(msg);
 
 if (!msg.contains(this.getVertexID())
-&& msg.size() == semiClusterMaximumVertexCount) {
+&& msg.size() < semiClusterMaximumVertexCount) {
   SemiClusterMessage msgNew = WritableUtils.clone(msg, this.getConf());
   msgNew.addVertex(this);
   msgNew.setSemiClusterId("C"
@@ -149,14 +149,15 @@
* @return the value to calcualte the Score of a semi-cluster.
*/
   public double semiClusterScoreCalcuation(SemiClusterMessage message) {
-double iC = 0.0, bC = 0.0, fB = 0.0, sC = 0.0;
-int vC = 0, eC = 0;
+// TODO fB is the bounday score factor. This should be configurable by user
+// the default is 0.5
+double iC = 0.0, bC = 0.0, fB = 0.5, sC = 0.0;
+int vC = 0;
 vC = message.size();
 for (Vertex v : message
 .getVertexList()) {
   List> eL = v.getEdges();
   for (Edge e : eL) {
-eC++;
 if (message.contains(e.getDestinationVertexID())
 && e.getValue() != null) {
   iC = iC + e.getValue().get();
@@ -165,8 +166,12 @@
 }
   }
 }
+
 if (vC > 1)
-  sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2)) / eC;
+  sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2));
+else
+  sC = 1.0;
+
 return sC;
   }
{code}

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265329#comment-15265329
 ] 

Behroz Sikander commented on HAMA-941:
--

According to Giraph implementation, our condition is wrong and should something 
like the following

if (!contains && cluster.vertices.size() < clusterCapacity) {

}

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265326#comment-15265326
 ] 

Behroz Sikander commented on HAMA-941:
--

In the past few days, I went through the material on semi-clustering. I have 
developed basic understanding of the algorithm but there is no real example 
implementation available online. 

Then I studied the code of semi-clustering implemented in Hama. I do not think 
that semi-clustering algorithm is working properly. The score of a semi-cluster 
remains 1.0 and it never changes (this 1.0 value is defined in the constructor 
of SemiClusterDetails class). Further, the most strange thing is that the code 
(semiClusterScoreCalcuation) to actually calculate the Score never fires !.

In the "compute" event of "SemiClusteringVertex" class, the following condition 
is never satisfied and since the condition is never satisfied, the score is 
never calculated.

if (!msg.contains(this.getVertexID())
&& msg.size() == semiClusterMaximumVertexCount) {
.
msgNew.setScore(semiClusterScoreCalcuation(msgNew));
.
}

How should we proceed ? I can look into the Giraph implementation of 
Semi-Clustering and can try to find out what is the problem with 
semi-clustering. 
(https://github.com/grafos-ml/okapi/blob/master/src/main/java/ml/grafos/okapi/graphs/SemiClustering.java)

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)