[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232653#comment-15232653
 ] 

ASF GitHub Bot commented on FLINK-2909:
---------------------------------------

Github user greghogan commented on the pull request:

    https://github.com/apache/flink/pull/1807#issuecomment-207546079
  
    @vasia thank you for the recommendations, the improvements are almost ready 
to push. Running the `Graph500` example on an AWS c4.8xlarge (with 36 'virtual 
cores') generated a billion edges (scale 26, edge factor 16) in 25.8s wall time 
(23.4s execution time). When simplifying the graph with 'clip-and-flip' the 
runtime was 2m33s wall time (2m31s execution time) and when performing a full 
flip (thus doubling the number of edges) the runtime was 5m00s wall time (4m58s 
execution time).
    
    Many algorithms require edges to be sorted so that cost is already 
accounted for. Edge generation should scale beautifully.


> Gelly Graph Generators
> ----------------------
>
>                 Key: FLINK-2909
>                 URL: https://issues.apache.org/jira/browse/FLINK-2909
>             Project: Flink
>          Issue Type: New Feature
>          Components: Gelly
>    Affects Versions: 1.0.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to