Hi all,

I asked Greg to start a discussion here about FLINK-3771 and FLINK-3772, to
make sure we're all on the same page about future Gelly development.

About a year ago we created the Gelly roadmap [1]. Many of these items have
been implemented and others were researched and either developed externally
or dropped. Graph translators and degree annotations were not in the
roadmap, but I personally think that they are both helpful features and I'm
in favor of adding them to Gelly.

That said, I find this a perfect timing for revisiting and updating our
Gelly development plans. We should update the roadmap so that it reflects
current state and future plans. This way, the community can have a clear
picture of what we are working on and what are upcoming features. This is
also very important for new contributors. They can always refer to the
roadmap to see what the community is interested in and we can avoid having
people spending their time on obsolete or dropped features. We should also
go through existing JIRAs and clean them up. Several are assigned to people
who have been inactive or might need help.

In the following days I will go through the roadmap and then start a new
thread where we can discuss features and agree on future plans. In the
meantime, it would be great if you give us your view on FLINK-3771 and
FLINK-3772.

Cheers,
-Vasia.

[1]: https://cwiki.apache.org/confluence/display/FLINK/Flink+Gelly

On 21 April 2016 at 18:52, Greg Hogan <c...@greghogan.com> wrote:

> Vasia and I are looking for additional feedback on FLINK-3772. This ticket
> [0] and PR [1] provides a set of graph algorithms which compute and store
> the degree for vertices and edges.
>
> Degree annotation is a basic component of many algorithms. For example,
> PageRank requires the vertex out-degree to evenly divide the vertex score
> among neighbors. The triangle enumeration algorithm in Gelly requires both
> source and target degree for each edge as an optimization. The Jaccard and
> Adamic-Adar similarities require the target degree for each edge.
>
> As discussed in the ticket, Graph has methods for outDegrees, inDegrees,
> and getDegrees. These are simple but difficult or impossible to use
> efficiently.
>
> Stepping back, I believe algorithms should be composable and reused where
> possible, not only to improve Flink but also to support users. Implementing
> algorithms as classes rather than Graph methods enables customization and
> optimization such as used here.
>
> One such optimization is CachingGraphAlgorithm which implicitly reuses
> DataSets. There is overlap between algorithms. From this ticket, annotating
> edges with source and target degree on an undirected graph can reuse vertex
> degree. Local clustering coefficient requires a triangle listing and global
> clustering coefficient requires a triangle count, there is no need to
> generate the list three times.
>
> Further optimizations include the use of mutable types, reusing sort order,
> avoidance of coGroup, configurable parallelism, and not unnecessarily
> materializing DataSets. I see all this as the expectation for inclusion in
> Flink.
>
> Greg
>
> [0] https://issues.apache.org/jira/browse/FLINK-3772
> [1] https://github.com/apache/flink/pull/1901/files
>

Reply via email to