[
https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438378#comment-13438378
]
Eli Reisman commented on GIRAPH-26:
-----------------------------------
This is looking really great, nice work.
- I'd like to go over the nature of your solution to making sure each worker
processes a unique part of the graph (was using the SplitIndex id's good enough
for your purposes, or do you still require each worker using a range of
vertexID's to process?) do you have other requirements on your wish list as far
as guaranteeing one worker to process each "virtual input split" as you
indicated before?
- this is seriously mathematical stuff, so var names like "lowDecisionBoundary"
and "upperEdgeRatio" are fantastic. Maybe replace names like "tempP1" and
"highCurrCumsum" and "array" with something long and annoying and super easy to
read/understand. Sorry. Go easy us, most of us have public school educations. ;)
- There are a couple typos in the comments. In general, maybe slap a Javadoc
comment on every method even the Overrides since you're doing nonstandard stuff
here, and include @param and @return tags for all. Don't be afraid to throw in
a few more inline comments in the methods, just a one-liner here and there to
give the reader a heads up about each major step in the algorithm code.
- not sure whats up with the "procID" as a random seed, if you need a different
seed per-worker you can probably dig the hostname/port combo up and hash them
from where your code sits in the framework. Let me know if you're curious about
this option.
- {insert brilliant math review here...}
- I'm sorry in advance: what about...a unit test? Again, let me know if you
need a leg up on this. If someone feels comfortable giving the math a thumbs up
without an included test case, we're probably good here.
- If Jakob were here, he'd say "don't delete old patches when you put up a new
one." Of course, I've zapped a few too :) so I can't say it.
Great work, Sean. Impressive stuff. Congrats again on the paper, too!
> Improve PseudoRandomVertexInputFormat to create a more realistic synthetic
> graph (e.g. power-law distributed vertex-cardinality).
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: GIRAPH-26
> URL: https://issues.apache.org/jira/browse/GIRAPH-26
> Project: Giraph
> Issue Type: Test
> Components: benchmark
> Affects Versions: 0.2.0
> Reporter: Jake Mannix
> Assignee: Sean Choi
> Priority: Minor
> Fix For: 0.2.0
>
> Attachments: GIRAPH-26.patch
>
>
> The PageRankBenchmark class, to be a proper benchmark, should run over graphs
> which look more like data seen in the wild, and web link graphs, social
> network graphs, and text corpora (represented as a bipartite graph) all have
> power-law distributions, so benchmarking a synthetic graph which looks more
> like this would be a nice test which would stress cases of uneven
> split-distribution and bottlenecks of subclusters of the graph of heavily
> connected vertices.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira