[
https://issues.apache.org/jira/browse/FLINK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742905#comment-14742905
]
ASF GitHub Bot commented on FLINK-2661:
---------------------------------------
Github user vasia commented on the pull request:
https://github.com/apache/flink/pull/1124#issuecomment-139959564
Hi @andralungu,
I was under the impression that we never really reached consensus in the
mailing list thread regarding this addition.
I definitely think we need to handle skewed graphs and you've done great
work, but I wouldn't add this to Gelly at its current state.
- This is a very recent method that has not been tested thoroughly (apart
from the experiments in your thesis work). Its benefits and overheads are not
yet well understood.
- The API certainly needs rethinking. This should be a transparent method
that would be easy to activate with a flag/option. Right now, it seems to me
that it's too complicated to use and can easily allow erroneous implementations.
For now I would suggest that we keep this in your personal repository and
we link to it from the Gelly documentation as additional/experimental feature.
After we have better understanding of the technique and we have thought of a
nicer API, we can reconsider adding it. What do you think?
> Add a Node Splitting Technique to Overcome the Limitations of Skewed Graphs
> ---------------------------------------------------------------------------
>
> Key: FLINK-2661
> URL: https://issues.apache.org/jira/browse/FLINK-2661
> Project: Flink
> Issue Type: Task
> Components: Gelly
> Affects Versions: 0.10
> Reporter: Andra Lungu
> Assignee: Andra Lungu
>
> Skewed graphs raise unique challenges to computation models such as Gelly's
> vertex-centric or GSA iterations. This is mainly because of the fact that
> these approaches uniformly process vertices regardless of their degree
> distribution.
> In vertex-centric, for instance, a skewed node will take more time to process
> its neighbors compared to the other nodes in the graph. The first will act as
> a straggler causing the latter to remain idle until it finishes its
> computation.
> This issue can be mitigated by splitting a high-degree node into subnodes and
> evenly distributing the edges to the the resulted subvertices. The
> computation will then be performed on the split vertex.
> To this end, we should add a Splitting API on top of Gelly which can help:
> - determine skewed nodes
> - split them
> - merge them back at the end of the computation, given a user defined
> combiner.
> To illustrate the usage of these methods, we should add an example as well as
> a separate entry in the documentation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)