[ https://issues.apache.org/jira/browse/FLINK-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702049#comment-15702049 ]
ASF GitHub Bot commented on FLINK-2549: --------------------------------------- GitHub user joseprupi opened a pull request: https://github.com/apache/flink/pull/2885 Damping This is the first commit of the bulk implementation for the Affinity Propagation algorithm. The algorithm calculations seem to be ok for the example in the pull request (it has been compared with the scikit implementation). There are some items pending and just doing the pull request to follow up the implementation and to review the graph. The graph for the implmentation: https://docs.google.com/drawings/d/1PC3S-6AEt2Gp_TGrSfiWzkTcL7vXhHSxvM6b9HglmtA/edit?usp=sharing Still pending: - Convergence - Greg suggestions: * Generic label type instead of Long. * Line 37, the join of similarities and messages (availabilities), we can perform a combinable reduce for the top 2 scores (and top label) rather than fully sorting all scores. There is an outstanding ticket, FLINK-2549, to add a topK operator. I expect that with k = 2 a custom implementation would be faster than topK implemented with a min-heap. * Enable object reuse, reuse objects, and use LongValue and DoubleValue rather than Long and Double. Rather than "return new ..." we can create the object as a class field and then update and return this object in the function. You can merge this pull request into a Git repository by running: $ git pull https://github.com/joseprupi/flink damping Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2885.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2885 ---- commit 509dc2f6987cf01ef04a5da2935cf1c9a7b59fec Author: Josep Rubio <jo...@datag.es> Date: 2016-11-13T14:21:14Z Working intermediat calculations commit 2dd3f761dd1bbd39e91b9b9ce4604c6741bbaa4e Author: Josep Rubio <jo...@datag.es> Date: 2016-11-13T15:07:31Z First propagation commit commit 04df0d31f1907427c507bc688dca0edda7b045e4 Author: Josep Rubio <jo...@datag.es> Date: 2016-11-20T16:47:40Z Implementing 1, wrong availability results commit 4149bde05e4615c6fc88169f898e3699e43f172b Author: Josep Rubio <jo...@datag.es> Date: 2016-11-26T16:39:55Z Damping working commit 8a7bee11fdd25a8e810d0ae1bbbbf93b132b5eda Author: Josep Rubio <jo...@datag.es> Date: 2016-11-27T04:23:22Z Damping working ---- > Add topK operator for DataSet > ----------------------------- > > Key: FLINK-2549 > URL: https://issues.apache.org/jira/browse/FLINK-2549 > Project: Flink > Issue Type: New Feature > Components: Core, DataSet API > Reporter: Chengxiang Li > Assignee: Chengxiang Li > Priority: Minor > > topK is a common operation for user, it would be great to have it in Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)