[
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232197#comment-15232197
]
ASF GitHub Bot commented on FLINK-2909:
---------------------------------------
Github user vasia commented on a diff in the pull request:
https://github.com/apache/flink/pull/1807#discussion_r59026824
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge
represents a group of edges
vertex and edge in the output graph stores the common group value and the
number of represented elements.
{% top %}
+
+Graph Generators
+-----------
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
--- End diff --
Could we add an overview of how graph generators can be created and used?
e.g. that you pass the parameters to the specific generator and then call
`generate()` to get the graph?
> Gelly Graph Generators
> ----------------------
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
> Issue Type: New Feature
> Components: Gelly
> Affects Versions: 1.0.0
> Reporter: Greg Hogan
> Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be
> useful for performing scalability, stress, and regression testing as well as
> benchmarking and comparing algorithms, for both Flink users and developers.
> Generated data is infinitely scalable yet described by a few simple
> parameters and can often substitute for user data or sharing large files when
> reporting issues.
> There are at multiple categories of graphs as documented by
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
> and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf]
> . A key consideration is that the graphs should source randomness from a
> seedable PRNG and generate the same Graph regardless of parallelism.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)