[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-16 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285684#comment-15285684
 ] 

Edward J. Yoon commented on HAMA-941:
-

Quick comment from Greg Malewicz -- "There are many clustering algorithms. 
Perhaps it's better to start with why you need to group items, and then look at 
papers for an algorithm that has the desired grouping properties."

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-16 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285487#comment-15285487
 ] 

Behroz Sikander commented on HAMA-941:
--

that is awesome :). Thanks.

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-16 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285259#comment-15285259
 ] 

Edward J. Yoon commented on HAMA-941:
-

Sure, I'll check. greg a original author is also near my seat. :-)




-- 
Best Regards, Edward J. Yoon


> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-16 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284867#comment-15284867
 ] 

Behroz Sikander commented on HAMA-941:
--

Before the holidays, I was trying to understand how the Giraph implementation 
was working but I was not able to make much sense of it. So, I posted a 
question on Giraph mailing list and till now I have not received any response.

https://mail-archives.apache.org/mod_mbox/giraph-user/201605.mbox/%3CCAAp_xXEo%2BoO8%3DJvNZ15u%2BedEUNEJmy3uuUU7oCUDJZ4aa-T9eQ%40mail.gmail.com%3E

Maybe, if possible, you can give me some hints that why the superstep1 output 
is different then expected.

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-06 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273918#comment-15273918
 ] 

Behroz Sikander commented on HAMA-941:
--

No problem. Enjoy your holidays. 

Yes, my goal is to fix the semi-clustering fully and submit the pull request. 
Once it is done. You can review the changes.

P.S, I will also take 3-4 days of holiday next week.

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-06 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273903#comment-15273903
 ] 

Edward J. Yoon commented on HAMA-941:
-

Sorry for lazy review, it's Korean holidays and I'll be back next week. Can you 
please try to find the bug of implementation? :-)

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-05 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273213#comment-15273213
 ] 

Behroz Sikander commented on HAMA-941:
--

I then tried to run a simple example against the algorithm. Here is the input 
{code}
1   2-1.0,3-1.0
2   1-1.0,3-2.0
3   1-1.0,2-2.0,4-2.0,5-1.0
4   3-2.0,5-1.0
5   3-1.0,4-1.0
{code}


> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-05 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273209#comment-15273209
 ] 

Behroz Sikander commented on HAMA-941:
--

So, I tried the semiclustering example again by changing the input a little bit 
and the test case fails. Even though it should pass. In the semiclustering.txt, 
I connected vertex 0 with vertex 1 and gave the edge a weight of 0.1. Here is 
the output which contains only 9 clusters instead of 10 and cluster C124546 has 
completly wrong output.
C1647421 = [32, 35, 38, 36, 30, 33, 39, 31, 37, 34]
C1770553 = [71, 74, 77, 72, 75, 78, 70, 73, 76, 79]
C1616638 = [26, 23, 20, 29, 27, 24, 21, 22, 25, 28]
C124546 = [0, 10]
C1832119 = [95, 98, 92, 96, 90, 99, 93, 97, 91, 94]
C1739770 = [65, 68, 62, 63, 69, 66, 60, 64, 67, 61]
C1678204 = [44, 41, 47, 48, 42, 45, 49, 46, 40, 43]
C1801336 = [80, 89, 83, 86, 87, 81, 84, 85, 88, 82]
C1708987 = [53, 50, 56, 59, 51, 57, 54, 55, 58, 52]

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265625#comment-15265625
 ] 

Edward J. Yoon commented on HAMA-941:
-

I just used \{code\} patch copied to clipboard \{code\} tag.

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265538#comment-15265538
 ] 

Behroz Sikander commented on HAMA-941:
--

Clusters seem to be generating correctly now. Yes, the factor should be user 
defined parameter. I can change that. 

I will check the code again for other problems.

btw, how do you create a patch and paste in a Jira comment ?

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265535#comment-15265535
 ] 

Behroz Sikander commented on HAMA-941:
--

Thank you. I will try the patch and will let you know if I find something else.


> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265533#comment-15265533
 ] 

Edward J. Yoon commented on HAMA-941:
-

P.S., Initial code can be found at HAMA-594. and, I changed few things because 
it doesn't work correctly.

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265532#comment-15265532
 ] 

Edward J. Yoon commented on HAMA-941:
-

First of all, it looks like boundary score factor seems always 0.0. This is the 
user-defined parameter. 2nd, if vertex count is (vC <= 1), score should be 1.0. 
Please apply my patch and test again. Do you see more bugs? 

{code}
diff --git 
a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java 
b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
index 9a905c1..38481fd 100644
--- 
a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
+++ 
b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
@@ -71,7 +71,7 @@
 candidates.add(msg);
 
 if (!msg.contains(this.getVertexID())
-&& msg.size() == semiClusterMaximumVertexCount) {
+&& msg.size() < semiClusterMaximumVertexCount) {
   SemiClusterMessage msgNew = WritableUtils.clone(msg, this.getConf());
   msgNew.addVertex(this);
   msgNew.setSemiClusterId("C"
@@ -149,14 +149,15 @@
* @return the value to calcualte the Score of a semi-cluster.
*/
   public double semiClusterScoreCalcuation(SemiClusterMessage message) {
-double iC = 0.0, bC = 0.0, fB = 0.0, sC = 0.0;
-int vC = 0, eC = 0;
+// TODO fB is the bounday score factor. This should be configurable by user
+// the default is 0.5
+double iC = 0.0, bC = 0.0, fB = 0.5, sC = 0.0;
+int vC = 0;
 vC = message.size();
 for (Vertex v : message
 .getVertexList()) {
   List> eL = v.getEdges();
   for (Edge e : eL) {
-eC++;
 if (message.contains(e.getDestinationVertexID())
 && e.getValue() != null) {
   iC = iC + e.getValue().get();
@@ -165,8 +166,12 @@
 }
   }
 }
+
 if (vC > 1)
-  sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2)) / eC;
+  sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2));
+else
+  sC = 1.0;
+
 return sC;
   }
{code}

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265329#comment-15265329
 ] 

Behroz Sikander commented on HAMA-941:
--

According to Giraph implementation, our condition is wrong and should something 
like the following

if (!contains && cluster.vertices.size() < clusterCapacity) {

}

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265326#comment-15265326
 ] 

Behroz Sikander commented on HAMA-941:
--

In the past few days, I went through the material on semi-clustering. I have 
developed basic understanding of the algorithm but there is no real example 
implementation available online. 

Then I studied the code of semi-clustering implemented in Hama. I do not think 
that semi-clustering algorithm is working properly. The score of a semi-cluster 
remains 1.0 and it never changes (this 1.0 value is defined in the constructor 
of SemiClusterDetails class). Further, the most strange thing is that the code 
(semiClusterScoreCalcuation) to actually calculate the Score never fires !.

In the "compute" event of "SemiClusteringVertex" class, the following condition 
is never satisfied and since the condition is never satisfied, the score is 
never calculated.

if (!msg.contains(this.getVertexID())
&& msg.size() == semiClusterMaximumVertexCount) {
.
msgNew.setScore(semiClusterScoreCalcuation(msgNew));
.
}

How should we proceed ? I can look into the Giraph implementation of 
Semi-Clustering and can try to find out what is the problem with 
semi-clustering. 
(https://github.com/grafos-ml/okapi/blob/master/src/main/java/ml/grafos/okapi/graphs/SemiClustering.java)

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-23 Thread Behroz Sikander (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255255#comment-15255255
 ] 

Behroz Sikander commented on HAMA-941:
--

Material:
http://www.comp.nus.edu.sg/~ooibc/vldb15-recovery.pdf
http://cedric.cnam.fr/~crucianm/src/BriefSurveyClustering.pdf
https://wiki.apache.org/hama/SemiClustering
http://stackoverflow.com/questions/11293919/what-is-the-significance-of-the-semi-clustering-formula-in-the-google-pregel-pap
https://paxtonryan.wordpress.com/tag/semi-clustering/
http://people.apache.org/~edwardyoon/documents/pregel.pdf
https://kowshik.github.io/JPregel/pregel_paper.pdf

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)