[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285684#comment-15285684 ] Edward J. Yoon commented on HAMA-941: - Quick comment from Greg Malewicz -- "There are many clustering algorithms. Perhaps it's better to start with why you need to group items, and then look at papers for an algorithm that has the desired grouping properties." > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285487#comment-15285487 ] Behroz Sikander commented on HAMA-941: -- that is awesome :). Thanks. > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285259#comment-15285259 ] Edward J. Yoon commented on HAMA-941: - Sure, I'll check. greg a original author is also near my seat. :-) -- Best Regards, Edward J. Yoon > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284867#comment-15284867 ] Behroz Sikander commented on HAMA-941: -- Before the holidays, I was trying to understand how the Giraph implementation was working but I was not able to make much sense of it. So, I posted a question on Giraph mailing list and till now I have not received any response. https://mail-archives.apache.org/mod_mbox/giraph-user/201605.mbox/%3CCAAp_xXEo%2BoO8%3DJvNZ15u%2BedEUNEJmy3uuUU7oCUDJZ4aa-T9eQ%40mail.gmail.com%3E Maybe, if possible, you can give me some hints that why the superstep1 output is different then expected. > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273918#comment-15273918 ] Behroz Sikander commented on HAMA-941: -- No problem. Enjoy your holidays. Yes, my goal is to fix the semi-clustering fully and submit the pull request. Once it is done. You can review the changes. P.S, I will also take 3-4 days of holiday next week. > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273903#comment-15273903 ] Edward J. Yoon commented on HAMA-941: - Sorry for lazy review, it's Korean holidays and I'll be back next week. Can you please try to find the bug of implementation? :-) > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273213#comment-15273213 ] Behroz Sikander commented on HAMA-941: -- I then tried to run a simple example against the algorithm. Here is the input {code} 1 2-1.0,3-1.0 2 1-1.0,3-2.0 3 1-1.0,2-2.0,4-2.0,5-1.0 4 3-2.0,5-1.0 5 3-1.0,4-1.0 {code} > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273209#comment-15273209 ] Behroz Sikander commented on HAMA-941: -- So, I tried the semiclustering example again by changing the input a little bit and the test case fails. Even though it should pass. In the semiclustering.txt, I connected vertex 0 with vertex 1 and gave the edge a weight of 0.1. Here is the output which contains only 9 clusters instead of 10 and cluster C124546 has completly wrong output. C1647421 = [32, 35, 38, 36, 30, 33, 39, 31, 37, 34] C1770553 = [71, 74, 77, 72, 75, 78, 70, 73, 76, 79] C1616638 = [26, 23, 20, 29, 27, 24, 21, 22, 25, 28] C124546 = [0, 10] C1832119 = [95, 98, 92, 96, 90, 99, 93, 97, 91, 94] C1739770 = [65, 68, 62, 63, 69, 66, 60, 64, 67, 61] C1678204 = [44, 41, 47, 48, 42, 45, 49, 46, 40, 43] C1801336 = [80, 89, 83, 86, 87, 81, 84, 85, 88, 82] C1708987 = [53, 50, 56, 59, 51, 57, 54, 55, 58, 52] > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265625#comment-15265625 ] Edward J. Yoon commented on HAMA-941: - I just used \{code\} patch copied to clipboard \{code\} tag. > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265538#comment-15265538 ] Behroz Sikander commented on HAMA-941: -- Clusters seem to be generating correctly now. Yes, the factor should be user defined parameter. I can change that. I will check the code again for other problems. btw, how do you create a patch and paste in a Jira comment ? > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265535#comment-15265535 ] Behroz Sikander commented on HAMA-941: -- Thank you. I will try the patch and will let you know if I find something else. > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265533#comment-15265533 ] Edward J. Yoon commented on HAMA-941: - P.S., Initial code can be found at HAMA-594. and, I changed few things because it doesn't work correctly. > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265532#comment-15265532 ] Edward J. Yoon commented on HAMA-941: - First of all, it looks like boundary score factor seems always 0.0. This is the user-defined parameter. 2nd, if vertex count is (vC <= 1), score should be 1.0. Please apply my patch and test again. Do you see more bugs? {code} diff --git a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java index 9a905c1..38481fd 100644 --- a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java +++ b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java @@ -71,7 +71,7 @@ candidates.add(msg); if (!msg.contains(this.getVertexID()) -&& msg.size() == semiClusterMaximumVertexCount) { +&& msg.size() < semiClusterMaximumVertexCount) { SemiClusterMessage msgNew = WritableUtils.clone(msg, this.getConf()); msgNew.addVertex(this); msgNew.setSemiClusterId("C" @@ -149,14 +149,15 @@ * @return the value to calcualte the Score of a semi-cluster. */ public double semiClusterScoreCalcuation(SemiClusterMessage message) { -double iC = 0.0, bC = 0.0, fB = 0.0, sC = 0.0; -int vC = 0, eC = 0; +// TODO fB is the bounday score factor. This should be configurable by user +// the default is 0.5 +double iC = 0.0, bC = 0.0, fB = 0.5, sC = 0.0; +int vC = 0; vC = message.size(); for (Vertex v : message .getVertexList()) { List> eL = v.getEdges(); for (Edge e : eL) { -eC++; if (message.contains(e.getDestinationVertexID()) && e.getValue() != null) { iC = iC + e.getValue().get(); @@ -165,8 +166,12 @@ } } } + if (vC > 1) - sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2)) / eC; + sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2)); +else + sC = 1.0; + return sC; } {code} > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265329#comment-15265329 ] Behroz Sikander commented on HAMA-941: -- According to Giraph implementation, our condition is wrong and should something like the following if (!contains && cluster.vertices.size() < clusterCapacity) { } > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265326#comment-15265326 ] Behroz Sikander commented on HAMA-941: -- In the past few days, I went through the material on semi-clustering. I have developed basic understanding of the algorithm but there is no real example implementation available online. Then I studied the code of semi-clustering implemented in Hama. I do not think that semi-clustering algorithm is working properly. The score of a semi-cluster remains 1.0 and it never changes (this 1.0 value is defined in the constructor of SemiClusterDetails class). Further, the most strange thing is that the code (semiClusterScoreCalcuation) to actually calculate the Score never fires !. In the "compute" event of "SemiClusteringVertex" class, the following condition is never satisfied and since the condition is never satisfied, the score is never calculated. if (!msg.contains(this.getVertexID()) && msg.size() == semiClusterMaximumVertexCount) { . msgNew.setScore(semiClusterScoreCalcuation(msgNew)); . } How should we proceed ? I can look into the Giraph implementation of Semi-Clustering and can try to find out what is the problem with semi-clustering. (https://github.com/grafos-ml/okapi/blob/master/src/main/java/ml/grafos/okapi/graphs/SemiClustering.java) > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255255#comment-15255255 ] Behroz Sikander commented on HAMA-941: -- Material: http://www.comp.nus.edu.sg/~ooibc/vldb15-recovery.pdf http://cedric.cnam.fr/~crucianm/src/BriefSurveyClustering.pdf https://wiki.apache.org/hama/SemiClustering http://stackoverflow.com/questions/11293919/what-is-the-significance-of-the-semi-clustering-formula-in-the-google-pregel-pap https://paxtonryan.wordpress.com/tag/semi-clustering/ http://people.apache.org/~edwardyoon/documents/pregel.pdf https://kowshik.github.io/JPregel/pregel_paper.pdf > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)