[jira] [Commented] (FLINK-2549) Add topK operator for DataSet

ASF GitHub Bot (JIRA) Fri, 04 Dec 2015 08:26:31 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041701#comment-15041701
 ]


ASF GitHub Bot commented on FLINK-2549:
---------------------------------------

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1161#issuecomment-162011423
  
    @ChengXiangLi Sorry for letting you wait, I have not forgotten about this 
pull request.
    
    There are two things in your comment:
    
      1. Exposing Managed Memory to UDFs (this pull request), which is much 
more convenient than going through the implementation of a deeply integrated 
operator.
    
      2. Efficiency for APIs like the Table API. The Table API works on managed 
memory already, since it sits on top of Flinks join/sort/etc. What you are 
hinting at is to have a lower level interface where functions gets the memory 
segments, rather than the row objects, and directly works on the memory 
segments. That has been a long which of mine as well, but that involves having 
a separate type of functions that support working with memory segments. Plus 
more, to handle records that are too large to fit into individual segments.
    
    I am still onto point (1). I aimed a bit too high with how I wanted to 
abstract that, but will continue.
    
    I would love to see point (2) at some point. If you are eager in driving 
point (2), I'd be very happy. We should probably have a chat and get this 
designed, as it involves quite a few things (exposing other abstractions, 
spanning records, etc). 
    



> Add topK operator for DataSet
> -----------------------------
>
>                 Key: FLINK-2549
>                 URL: https://issues.apache.org/jira/browse/FLINK-2549
>             Project: Flink
>          Issue Type: New Feature
>          Components: Core, Java API, Scala API
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>            Priority: Minor
>
> topK is a common operation for user, it would be great to have it in Flink. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2549) Add topK operator for DataSet

Reply via email to