Gianmarco De Francisci Morales commented on GIRAPH-96:

In my opinion this would make things too complex.
I wouldn't like to keep my graph in HBase to run Giraph on it.
Also, this makes HBase a dependency.

I agree that this is a nice option to have but I wouldn't make it the default.

Finally, a similar goal could be attained by streaming edges to disk and 
reading them with sequential scans when performing supersteps. This requires no 
network connection and should be much faster.
You just need good out-of-core data structures and algorithms.
> Support for Graphs with Huge adjacency lists
> --------------------------------------------
>                 Key: GIRAPH-96
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-96
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.70.0
>            Reporter: Arun Suresh
> Currently the vertex initialize() method is passed the complete adjacency 
> list as a HashMap. All the current concrete implementations of Vertex iterate 
> over the adjacency list and recreate new Data Structures within the Vertex 
> instance to hold/manipulate the adjacency list. This would seize to be 
> feasible once the size of the adjacency list becomes really huge.
> I propose storing the adjacency list and all vertex information (and incoming 
> messages ?) in a distributed data store such as HBase. The adjacency list can 
> be lazily loaded via HBase Scans. I was thinking of an HBase schema where the 
> row Id is a concatenation of VertexID+OutboundVertexId with a single column 
> containing the edge.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to