[ 
https://issues.apache.org/jira/browse/FLINK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743836#comment-14743836
 ] 

ASF GitHub Bot commented on FLINK-2661:
---------------------------------------

Github user vasia commented on the pull request:

    https://github.com/apache/flink/pull/1124#issuecomment-140146467
  
    Sure, I'm not saying you shouldn't have opened this PR. I'm trying to save 
you some time.
    
    So, my opinion is that in order for this to be useful, we need 2 things:
    - Understand the performance implications of the method. When is it 
beneficial? What's the memory overhead? What's the pre-processing overhead? How 
does it depend on the input? What if I give a wrong threshold? etc.
    - Make this as automatic as possible. Ideally, the user would only have to 
give the combiner function and the threshold. Splitting and merge should be 
handled internally. Actually, we could probably find ways to automatically 
determine the threshold, too, based on the graph, the current load and the 
available memory.
    
    These are not easy tasks and will need a lot of work. That's why I'm 
proposing to link to the current state of this method from the Gelly docs, so 
that we can get feedback. We could create something like a "research projects 
on Gelly" page, where we link to work in progress from our roadmap tasks.
    
    That's just my view, but let's see what other people think :)


> Add a Node Splitting Technique to Overcome the Limitations of Skewed Graphs
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-2661
>                 URL: https://issues.apache.org/jira/browse/FLINK-2661
>             Project: Flink
>          Issue Type: Task
>          Components: Gelly
>    Affects Versions: 0.10
>            Reporter: Andra Lungu
>            Assignee: Andra Lungu
>
> Skewed graphs raise unique challenges to computation models such as Gelly's 
> vertex-centric or GSA iterations. This is mainly because of the fact that 
> these approaches uniformly process vertices regardless of their degree 
> distribution. 
> In vertex-centric, for instance, a skewed node will take more time to process 
> its neighbors compared to the other nodes in the graph. The first will act as 
> a straggler causing the latter to remain idle until it finishes its 
> computation. 
> This issue can be mitigated by splitting a high-degree node into subnodes and 
> evenly distributing the edges to the the resulted subvertices. The 
> computation will then be performed on the split vertex. 
> To this end, we should add a Splitting API on top of Gelly which can help:
> - determine skewed nodes 
> - split them
> - merge them back at the end of the computation, given a user defined 
> combiner.
> To illustrate the usage of these methods, we should add an example as well as 
> a separate entry in the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to