[
https://issues.apache.org/jira/browse/FLINK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743766#comment-14743766
]
ASF GitHub Bot commented on FLINK-2661:
---------------------------------------
Github user andralungu commented on the pull request:
https://github.com/apache/flink/pull/1124#issuecomment-140131099
Hi @vasia ,
There was no clear rejection in the discussion, so I thought I'd try my
luck :)
This technique could be enabled by using a flag only if we introduce a node
split version of all the library methods. Otherwise, for newly written code,
users would still have to pass a combine function according to their algorithm.
They'd also have to pass the threshold that differentiates high-degree vertices
from low-degree vertices. What is more, they still need to split and aggregate
before and after each step.
These are aspects that cannot be guessed and that highly depend on the
algorithm.
Also, what I had in mind regarding this PR was that we would discuss and
try to improve the code a bit, rather than immediately discarding it :(
> Add a Node Splitting Technique to Overcome the Limitations of Skewed Graphs
> ---------------------------------------------------------------------------
>
> Key: FLINK-2661
> URL: https://issues.apache.org/jira/browse/FLINK-2661
> Project: Flink
> Issue Type: Task
> Components: Gelly
> Affects Versions: 0.10
> Reporter: Andra Lungu
> Assignee: Andra Lungu
>
> Skewed graphs raise unique challenges to computation models such as Gelly's
> vertex-centric or GSA iterations. This is mainly because of the fact that
> these approaches uniformly process vertices regardless of their degree
> distribution.
> In vertex-centric, for instance, a skewed node will take more time to process
> its neighbors compared to the other nodes in the graph. The first will act as
> a straggler causing the latter to remain idle until it finishes its
> computation.
> This issue can be mitigated by splitting a high-degree node into subnodes and
> evenly distributing the edges to the the resulted subvertices. The
> computation will then be performed on the split vertex.
> To this end, we should add a Splitting API on top of Gelly which can help:
> - determine skewed nodes
> - split them
> - merge them back at the end of the computation, given a user defined
> combiner.
> To illustrate the usage of these methods, we should add an example as well as
> a separate entry in the documentation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)