[ 
https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464969#comment-13464969
 ] 

Jake Mannix commented on GIRAPH-26:
-----------------------------------

Regarding COLT, there are a couple of tricky points with this: it's actually 
something you have to be careful about - if you include the whole COLT jar, you 
may be getting some of the statistics stuff which comes with a disallowed 
license (they require that the code *not* be used in military applications, 
something the ASF cannot force its consumers to abide by).  Secondly, COLT is 
*completely* unmaintained now, and we over in Mahout-land contacted the 
original author and got his blessing to absorb/adopt the code into Mahout, so 
all of that code which is of the proper license has been absorbed in the 
mahout-math maven module, and that which was not ASL-compatible has been 
removed.

I would suggest refactoring to depend on mahout-math (changing package names in 
the imports may be all that is required, but maybe a bit more) instead.  At 
least with that, when you find a bug (which we've done, several times: COLT has 
no unit tests), you can bother someone to fix it and increment a version.  With 
raw COLT, you have nobody to bother about it - it's been abandoned.
                
> Improve PseudoRandomVertexInputFormat to create a more realistic synthetic 
> graph (e.g. power-law distributed vertex-cardinality).
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-26
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-26
>             Project: Giraph
>          Issue Type: Test
>          Components: benchmark
>    Affects Versions: 0.2.0
>            Reporter: Jake Mannix
>            Assignee: Sean Choi
>            Priority: Minor
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-26-2.patch, GIRAPH-26-3.patch, GIRAPH-26.patch
>
>
> The PageRankBenchmark class, to be a proper benchmark, should run over graphs 
> which look more like data seen in the wild, and web link graphs, social 
> network graphs, and text corpora (represented as a bipartite graph) all have 
> power-law distributions, so benchmarking a synthetic graph which looks more 
> like this would be a nice test which would stress cases of uneven 
> split-distribution and bottlenecks of subclusters of the graph of heavily 
> connected vertices.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to