[jira] [Commented] (FLINK-2549) Add topK operator for DataSet

ASF GitHub Bot (JIRA) Mon, 28 Nov 2016 06:14:38 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702049#comment-15702049
 ]


ASF GitHub Bot commented on FLINK-2549:
---------------------------------------

GitHub user joseprupi opened a pull request:

    https://github.com/apache/flink/pull/2885

    Damping

    This is the first commit of the bulk implementation for the Affinity 
Propagation algorithm. The algorithm calculations seem to be ok for the example 
in the pull request (it has been compared with the scikit implementation). 
There are some items pending and just doing the pull request to follow up the 
implementation and to review the graph.
    
    The graph for the implmentation:
    
    
https://docs.google.com/drawings/d/1PC3S-6AEt2Gp_TGrSfiWzkTcL7vXhHSxvM6b9HglmtA/edit?usp=sharing
    
    Still pending:
    
    - Convergence
    - Greg suggestions:
    
    * Generic label type instead of Long.
    
    * Line 37, the join of similarities and messages (availabilities), we can 
perform a combinable reduce for the top 2 scores (and top label) rather than 
fully sorting all scores. There is an outstanding ticket, FLINK-2549, to add a 
topK operator. I expect that with k = 2 a custom implementation would be faster 
than topK implemented with a min-heap.
    
    * Enable object reuse, reuse objects, and use LongValue and DoubleValue 
rather than Long and Double. Rather than "return new ..." we can create the 
object as a class field and then update and return this object in the function.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/joseprupi/flink damping

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2885.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2885
    
----
commit 509dc2f6987cf01ef04a5da2935cf1c9a7b59fec
Author: Josep Rubio <jo...@datag.es>
Date:   2016-11-13T14:21:14Z

    Working intermediat calculations

commit 2dd3f761dd1bbd39e91b9b9ce4604c6741bbaa4e
Author: Josep Rubio <jo...@datag.es>
Date:   2016-11-13T15:07:31Z

    First propagation commit

commit 04df0d31f1907427c507bc688dca0edda7b045e4
Author: Josep Rubio <jo...@datag.es>
Date:   2016-11-20T16:47:40Z

    Implementing 1, wrong availability results

commit 4149bde05e4615c6fc88169f898e3699e43f172b
Author: Josep Rubio <jo...@datag.es>
Date:   2016-11-26T16:39:55Z

    Damping working

commit 8a7bee11fdd25a8e810d0ae1bbbbf93b132b5eda
Author: Josep Rubio <jo...@datag.es>
Date:   2016-11-27T04:23:22Z

    Damping working

----


> Add topK operator for DataSet
> -----------------------------
>
>                 Key: FLINK-2549
>                 URL: https://issues.apache.org/jira/browse/FLINK-2549
>             Project: Flink
>          Issue Type: New Feature
>          Components: Core, DataSet API
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>            Priority: Minor
>
> topK is a common operation for user, it would be great to have it in Flink. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2549) Add topK operator for DataSet

Reply via email to