HBase/Accumulo Input and Output formats
---------------------------------------
Key: GIRAPH-153
URL: https://issues.apache.org/jira/browse/GIRAPH-153
Project: Giraph
Issue Type: New Feature
Components: bsp
Affects Versions: 0.1.0
Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
Attachments: AccumuloRootMarker.java,
AccumuloRootMarkerInputFormat.java, AccumuloRootMarkerOutputFormat.java,
AccumuloVertexInputFormat.java, AccumuloVertexOutputFormat.java,
ComputeIsRoot.java, DistributedCacheHelper.java, HBaseVertexInputFormat.java,
HBaseVertexOutputFormat.java, IdentifyAndMarkRoots.java, SetLongWritable.java,
SetTextWritable.java, TableRootMarker.java, TableRootMarkerInputFormat.java,
TableRootMarkerOutputFormat.java
Four abstract classes that wrap their respective delegate input/output formats
for
easy hooks into vertex input format subclasses. I've included some sample
programs that show two very simple graph
algorithms. I have a graph generator that builds out a very simple direct
structure, starting with a few 'root' nodes.
Root nodes are defined as nodes that is not listed as a child anywhere in the
graph.
Algorithm 1) AccumuloRootMarker.java --> Accumulo as read/write source. Every
vertex starts thinking it's a root. At superstep 0, send a message down to each
child as a non-root notification. After superstep 1, only root nodes will have
never been messaged.
Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by
bundling the notification logic followed by root node propagation. Once we've
marked the appropriate nodes as roots, tell every child which roots it can be
traced back to via one or more spanning trees. This will take N + 2 supersteps
where N is the maximum number of hops from any root to any leaf, plus 2
supersteps for the initial root flagging.
I've included all relevant code plus DistributedCacheHelper.java for recursive
cache file and archive searches. It is more hadoop centric than giraph in
particular, but these jobs use it so I figured why not commit here.
These have been tested through local JobRunner, pseudo-distributed on the
aforementioned hardware, and full distributed on EC2. More details in the
comments.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira