[
https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464969#comment-13464969
]
Jake Mannix commented on GIRAPH-26:
-----------------------------------
Regarding COLT, there are a couple of tricky points with this: it's actually
something you have to be careful about - if you include the whole COLT jar, you
may be getting some of the statistics stuff which comes with a disallowed
license (they require that the code *not* be used in military applications,
something the ASF cannot force its consumers to abide by). Secondly, COLT is
*completely* unmaintained now, and we over in Mahout-land contacted the
original author and got his blessing to absorb/adopt the code into Mahout, so
all of that code which is of the proper license has been absorbed in the
mahout-math maven module, and that which was not ASL-compatible has been
removed.
I would suggest refactoring to depend on mahout-math (changing package names in
the imports may be all that is required, but maybe a bit more) instead. At
least with that, when you find a bug (which we've done, several times: COLT has
no unit tests), you can bother someone to fix it and increment a version. With
raw COLT, you have nobody to bother about it - it's been abandoned.
> Improve PseudoRandomVertexInputFormat to create a more realistic synthetic
> graph (e.g. power-law distributed vertex-cardinality).
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: GIRAPH-26
> URL: https://issues.apache.org/jira/browse/GIRAPH-26
> Project: Giraph
> Issue Type: Test
> Components: benchmark
> Affects Versions: 0.2.0
> Reporter: Jake Mannix
> Assignee: Sean Choi
> Priority: Minor
> Fix For: 0.2.0
>
> Attachments: GIRAPH-26-2.patch, GIRAPH-26-3.patch, GIRAPH-26.patch
>
>
> The PageRankBenchmark class, to be a proper benchmark, should run over graphs
> which look more like data seen in the wild, and web link graphs, social
> network graphs, and text corpora (represented as a bipartite graph) all have
> power-law distributions, so benchmarking a synthetic graph which looks more
> like this would be a nice test which would stress cases of uneven
> split-distribution and bottlenecks of subclusters of the graph of heavily
> connected vertices.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira