[
https://issues.apache.org/jira/browse/FLINK-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702049#comment-15702049
]
ASF GitHub Bot commented on FLINK-2549:
---------------------------------------
GitHub user joseprupi opened a pull request:
https://github.com/apache/flink/pull/2885
Damping
This is the first commit of the bulk implementation for the Affinity
Propagation algorithm. The algorithm calculations seem to be ok for the example
in the pull request (it has been compared with the scikit implementation).
There are some items pending and just doing the pull request to follow up the
implementation and to review the graph.
The graph for the implmentation:
https://docs.google.com/drawings/d/1PC3S-6AEt2Gp_TGrSfiWzkTcL7vXhHSxvM6b9HglmtA/edit?usp=sharing
Still pending:
- Convergence
- Greg suggestions:
* Generic label type instead of Long.
* Line 37, the join of similarities and messages (availabilities), we can
perform a combinable reduce for the top 2 scores (and top label) rather than
fully sorting all scores. There is an outstanding ticket, FLINK-2549, to add a
topK operator. I expect that with k = 2 a custom implementation would be faster
than topK implemented with a min-heap.
* Enable object reuse, reuse objects, and use LongValue and DoubleValue
rather than Long and Double. Rather than "return new ..." we can create the
object as a class field and then update and return this object in the function.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/joseprupi/flink damping
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2885.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2885
----
commit 509dc2f6987cf01ef04a5da2935cf1c9a7b59fec
Author: Josep Rubio <[email protected]>
Date: 2016-11-13T14:21:14Z
Working intermediat calculations
commit 2dd3f761dd1bbd39e91b9b9ce4604c6741bbaa4e
Author: Josep Rubio <[email protected]>
Date: 2016-11-13T15:07:31Z
First propagation commit
commit 04df0d31f1907427c507bc688dca0edda7b045e4
Author: Josep Rubio <[email protected]>
Date: 2016-11-20T16:47:40Z
Implementing 1, wrong availability results
commit 4149bde05e4615c6fc88169f898e3699e43f172b
Author: Josep Rubio <[email protected]>
Date: 2016-11-26T16:39:55Z
Damping working
commit 8a7bee11fdd25a8e810d0ae1bbbbf93b132b5eda
Author: Josep Rubio <[email protected]>
Date: 2016-11-27T04:23:22Z
Damping working
----
> Add topK operator for DataSet
> -----------------------------
>
> Key: FLINK-2549
> URL: https://issues.apache.org/jira/browse/FLINK-2549
> Project: Flink
> Issue Type: New Feature
> Components: Core, DataSet API
> Reporter: Chengxiang Li
> Assignee: Chengxiang Li
> Priority: Minor
>
> topK is a common operation for user, it would be great to have it in Flink.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)