[
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307166#comment-15307166
]
ASF GitHub Bot commented on FLINK-1707:
---------------------------------------
GitHub user joseprupi opened a pull request:
https://github.com/apache/flink/pull/2053
Affinity Propagation
Hello,
I have added an implementation of the Binary Affinity Propagation to Gelly.
The Jira issue is:
https://issues.apache.org/jira/browse/FLINK-1707
This is not an implementation of the original Affinity Propagation and the
detail of it can be found at:
http://www.psi.toronto.edu/pubs2/2009/NC09%20-%20SimpleAP.pdf
I have also added a simple example of it. To check this implementation I
have compared the results with the scikit one for the original Affinity
Propagation. Although the results are not exactly the same
they make sense to me, and I think properly parametrizing the epsilon in
flink implementation and iterations in scikit we could get same result. I
haven't found an implementation of this algorithm to compare the results.
The example in scikit clusters stock prices. I've modified it to cluster
more stocks than the original one. The results using scikit are:
Cluster 1: Pepsi, Coca Cola, Kellogg
Cluster 2: Navistar
Cluster 3: Kimberly-Clark, Colgate-Palmolive, Procter Gamble, Kraft Foods
Cluster 4: Wal-Mart
Cluster 5: Comcast, Time Warner, Cablevision
Cluster 6: ConocoPhillips, Apple, GlaxoSmithKline, Microsoft, SAP, Pfizer,
Novartis, 3M, Sanofi-Aventis, IBM, Chevron, DuPont de Nemours, CVS, Total,
Caterpillar, Home Depot, Valero Energy, Yahoo, Exxon, Mc Donalds, Cisco,
Unilever
Cluster 7: Ryder, Sony, Amazon, Marriott, Canon, Texas instruments, Ford,
Toyota, Honda, HP, Mitsubishi, Xerox
Cluster 8: American express, Goldman Sachs, General Electrics, Wells Fargo,
Bank of America, AIG, JPMorgan Chase
Cluster 9: Raytheon, Boeing, Walgreen, Lookheed Martin, General Dynamics,
Northrop Grumman
Using flink implementation:
Cluster 1: CVS, Walgreen
Cluster 2: American express, Goldman Sachs, General Electrics, Wells Fargo,
Bank of America, AIG, JPMorgan Chase
Cluster 3: Navistar
Cluster 4: Kimberly-Clark, Colgate-Palmolive, Procter Gamble, Kellogg
Cluster 5: Ryder, Sony, Marriott, Canon, Texas instruments, Ford, Toyota,
Honda, HP, Mitsubishi, Xerox
Cluster 6: Amazon, Yahoo
Cluster 7: Pepsi, Coca Cola, Kraft Foods
Cluster 8: Comcast, Time Warner, Cablevision
Cluster 9: Raytheon, Boeing, Lookheed Martin, General Dynamics, Northrop
Grumman
Cluster 10: Wal-Mart
Cluster 11: ConocoPhillips, Apple, GlaxoSmithKline, Microsoft, SAP, Pfizer,
Novartis, 3M, Sanofi-Aventis, IBM, Chevron, DuPont de Nemours, Total,
Caterpillar, Home Depot, Valero Energy, Exxon, Mc Donalds, Cisco, Unilever
This results can be reproduced with scikit branch in my flink repository
and https://github.com/joseprupi/stockclusteringtest.
As a future work in this implementation I guess the original algorithm
could be implemented and maybe the Capacitated Affinity Propagation is
mentioned in the paper.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/joseprupi/flink master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2053.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2053
----
commit d3b4cd98fcd07b5bba265baccbb465b6edc22e09
Author: Josep Rubio <[email protected]>
Date: 2016-05-31T02:31:41Z
Affinity Propagation
----
> Add an Affinity Propagation Library Method
> ------------------------------------------
>
> Key: FLINK-1707
> URL: https://issues.apache.org/jira/browse/FLINK-1707
> Project: Flink
> Issue Type: New Feature
> Components: Gelly
> Reporter: Vasia Kalavri
> Assignee: Josep Rubió
> Priority: Minor
> Labels: requires-design-doc
> Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf
>
>
> This issue proposes adding the an implementation of the Affinity Propagation
> algorithm as a Gelly library method and a corresponding example.
> The algorithm is described in paper [1] and a description of a vertex-centric
> implementation can be found is [2].
> [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
> [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf
> Design doc:
> https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)