Brian Femiano commented on GIRAPH-153:
Patch contains the entire submodule including HBase and Accumulo unit tests. It
has been tested against Accumulo 1.4 (latest release) and HBase 0.90.5 with
Zookeeper 3.3.3. It includes 4 abstract classes designed to help subclass
reading/writing to and from these datastores.
The test package shows a few example subclasses which were needed to verify the
behavior. For now they only run in local mode and will be disabled if the user
supplies a jobtracker URI.
It builds exactly as described in the earlier comments. Simply run 'mvn verify'
and you'll get an isolated build.
A few caveats:
1) Users must 'mvn install' the giraph artifact in their local repo, at least
until we get something posted on maven central.
2) I modified the pom.xml to exclude the artifact from the rat plugin. I
realize this is less than desirable, but I couldn't get anything running
despite numerous attempts at fixing the "too many unapproved licenses"
issues. I'm interested to hear your guys thoughts.
3) Duplicate BspCase in my submodule, at least until Giraph has a test
4) Initializing the AccumuloVertexInputFormat has some procedural limitations
inherent in the format design when run with the GiraphJob. It really expects to
have control of the Job instance. These can be difficult to track down. I tried
to document these in my unit tests and provide some simple error wrapping to
help notify users when they see these.
5) No README.txt or any wiki entry yet. I figured I'd wait and see what
feedback you guys had.
Hopefully people will find the submodule useful.
> HBase/Accumulo Input and Output formats
> Key: GIRAPH-153
> URL: https://issues.apache.org/jira/browse/GIRAPH-153
> Project: Giraph
> Issue Type: New Feature
> Components: bsp
> Affects Versions: 0.1.0
> Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
> Reporter: Brian Femiano
> Attachments: GIRAPH-153.patch
> Four abstract classes that wrap their respective delegate input/output
> formats for
> easy hooks into vertex input format subclasses. I've included some sample
> programs that show two very simple graph
> algorithms. I have a graph generator that builds out a very simple directed
> structure, starting with a few 'root' nodes.
> Root nodes are defined as nodes which are not listed as a child anywhere in
> the graph.
> Algorithm 1) AccumuloRootMarker.java --> Accumulo as read/write source.
> Every vertex starts thinking it's a root. At superstep 0, send a message down
> to each
> child as a non-root notification. After superstep 1, only root nodes will
> have never been messaged.
> Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by
> bundling the notification logic followed by root node propagation. Once we've
> marked the appropriate nodes as roots, tell every child which roots it can be
> traced back to via one or more spanning trees. This will take N + 2
> supersteps where N is the maximum number of hops from any root to any leaf,
> plus 2 supersteps for the initial root flagging.
> I've included all relevant code plus DistributedCacheHelper.java for
> recursive cache file and archive searches. It is more hadoop centric than
> giraph, but these jobs use it so I figured why not commit here.
> These have been tested through local JobRunner, pseudo-distributed on the
> aforementioned hardware, and full distributed on EC2. More details in the
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
For more information on JIRA, see: http://www.atlassian.com/software/jira