[ https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099043#comment-13099043 ]

## Advertising

Jake Mannix commented on GIRAPH-26: ----------------------------------- Dmitriy, I was thinking along the lines of a less-rigorous version of the kroeneker product technique, as described in that paper. Essentially you do the following, to find the adjacency matrix entry at row 1523, column 4918, take four "seed matrices" (most likely pseudorandom variations on a single fixed seed matrix), each of size 10x10 (could be any base, but it's easiest to consider decimal notation, and 10 is big enough to actually have a sensible small graph you're going to kroenker iterate on): A_{1523, 4918} = S_{1,4} * T_{5,4} * U_{2,1} * V_{3,8} This requires no communication between the vertexes, and is totally fixed (ie deterministic) once you choose your seed matrices. The trick is to figure out (fast!) which entries are nonzero, for any given vertex: A_{1523, *} = S_{1,*} * T_{5,*} * U_{2,*} * V_{3,*} So at this point, you know you're dealing with row 1 of S, row 5 of T, row 2 of U, and row 3 of V. Well, if each of these matrices is represented sparsely, and you can iterate quickly over *only* their nonzero entries (and there should be only a few in each row, as we're talking about a 10-vertex seed graph which is not too connected), and thus iterate over only the nonzero entries in the edge list for vertex 1523 very quickly as well: if row 1 of S has nonzero entries at points 1,4,9, row 5 of T has nonzeroes at 5,6, row 2 of U has nonzeroes only at 0, and row 3 of V has nonzeroes at 0,3,4, then we'll have a total of 3*2*1*3 = 18 nonzero edges for vertex 1523, and we can iterate over them very quickly. I think this should make a pretty nice, relatively scale-free graph which has density parametrizable by initial 10x10 seed graph, with randomness introduced by how you want to create those S,T,U... variations on that seed. And it should be embarrassingly parallel. Anything I'm saying make no sense at all? > Improve PseudoRandomVertexInputFormat to create a more realistic synthetic > graph (e.g. power-law distributed vertex-cardinality). > --------------------------------------------------------------------------------------------------------------------------------- > > Key: GIRAPH-26 > URL: https://issues.apache.org/jira/browse/GIRAPH-26 > Project: Giraph > Issue Type: Test > Components: benchmark > Reporter: Jake Mannix > Priority: Minor > > The PageRankBenchmark class, to be a proper benchmark, should run over graphs > which look more like data seen in the wild, and web link graphs, social > network graphs, and text corpora (represented as a bipartite graph) all have > power-law distributions, so benchmarking a synthetic graph which looks more > like this would be a nice test which would stress cases of uneven > split-distribution and bottlenecks of subclusters of the graph of heavily > connected vertices. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira