[
https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429618#comment-13429618
]
Eli Reisman commented on GIRAPH-26:
-----------------------------------
Sean, nice work. Some things to think about:
to get your input seed matrix to this new input format, try adding some
constants to GiraphJob.java in the graph/ directory.
/** Allows a seed matrix to be entered at the command line in the format:
* [0.1 0.2 0.3 seans.format listed.here etc... ]
*/
public final String KROENECKER_SEED_MATRIX = "giraph.kroenecker.seed";
/** A default setting for KROENECKER_SEED_MATRIX if no command line argument is
supplied */
public final String KROENECKER_SEED_MATRIX_DEFAULT = "0.5 11.1 .33 .more
.clever .numbers .here";
then inside your code, you can check for these constants stored in the
Configuration and the get methods will allow you to substitute the default
where you have "" if no default is entered. This also prevents having to
hardcode the defaults inside your IO format itself but colocate them with the
other defaults where new users can review all the options at one. Soon there
will be a specific class for this, GiraphConf but for now GiraphJob is the
place to put it. GiraphRunner and the framework will ensure if someone enters
data under "giraph.kroenecker.seed" that it will end up in the Configuration
object, ready for you to pull out, using the technique your code already
employs.
I want to ask a few more things but I will wait for the updated patch. This
will be super useful to all of us for testing our code, thanks!
> Improve PseudoRandomVertexInputFormat to create a more realistic synthetic
> graph (e.g. power-law distributed vertex-cardinality).
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: GIRAPH-26
> URL: https://issues.apache.org/jira/browse/GIRAPH-26
> Project: Giraph
> Issue Type: Test
> Components: benchmark
> Affects Versions: 0.2.0
> Reporter: Jake Mannix
> Assignee: Sean Choi
> Priority: Minor
> Fix For: 0.2.0
>
> Attachments: GIRAPH-26-1.patch
>
>
> The PageRankBenchmark class, to be a proper benchmark, should run over graphs
> which look more like data seen in the wild, and web link graphs, social
> network graphs, and text corpora (represented as a bipartite graph) all have
> power-law distributions, so benchmarking a synthetic graph which looks more
> like this would be a nice test which would stress cases of uneven
> split-distribution and bottlenecks of subclusters of the graph of heavily
> connected vertices.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira