[ 
https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438378#comment-13438378
 ] 

Eli Reisman commented on GIRAPH-26:
-----------------------------------

This is looking really great, nice work.

- I'd like to go over the nature of your solution to making sure each worker 
processes a unique part of the graph (was using the SplitIndex id's good enough 
for your purposes, or do you still require each worker using a range of 
vertexID's to process?) do you have other requirements on your wish list as far 
as guaranteeing one worker to process each "virtual input split" as you 
indicated before?

- this is seriously mathematical stuff, so var names like "lowDecisionBoundary" 
and "upperEdgeRatio" are fantastic. Maybe replace names like "tempP1" and 
"highCurrCumsum" and "array" with something long and annoying and super easy to 
read/understand. Sorry. Go easy us, most of us have public school educations. ;)

- There are a couple typos in the comments. In general, maybe slap a Javadoc 
comment on every method even the Overrides since you're doing nonstandard stuff 
here, and include @param and @return tags for all. Don't be afraid to throw in 
a few more inline comments in the methods, just a one-liner here and there to 
give the reader a heads up about each major step in the algorithm code.

- not sure whats up with the "procID" as a random seed, if you need a different 
seed per-worker you can probably dig the hostname/port combo up and hash them 
from where your code sits in the framework. Let me know if you're curious about 
this option.

- {insert brilliant math review here...}

- I'm sorry in advance: what about...a unit test? Again, let me know if you 
need a leg up on this. If someone feels comfortable giving the math a thumbs up 
without an included test case, we're probably good here.

- If Jakob were here, he'd say "don't delete old patches when you put up a new 
one." Of course, I've zapped a few too :) so I can't say it.

Great work, Sean. Impressive stuff. Congrats again on the paper, too!
                
> Improve PseudoRandomVertexInputFormat to create a more realistic synthetic 
> graph (e.g. power-law distributed vertex-cardinality).
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-26
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-26
>             Project: Giraph
>          Issue Type: Test
>          Components: benchmark
>    Affects Versions: 0.2.0
>            Reporter: Jake Mannix
>            Assignee: Sean Choi
>            Priority: Minor
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-26.patch
>
>
> The PageRankBenchmark class, to be a proper benchmark, should run over graphs 
> which look more like data seen in the wild, and web link graphs, social 
> network graphs, and text corpora (represented as a bipartite graph) all have 
> power-law distributions, so benchmarking a synthetic graph which looks more 
> like this would be a nice test which would stress cases of uneven 
> split-distribution and bottlenecks of subclusters of the graph of heavily 
> connected vertices.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to