Brian Femiano commented on GIRAPH-153:

Maven submodules introduce build dependencies into the main parent build. The 
giraph-formats-contrib submodule would have
to be built first even if the user had no need for it, which is not what we 
want. Submodules allow individual components to be built
independently and then grouped together in one final build. You can build the 
submodule in isolation and inherit the dependencies
from the parent pom.

An advantage is you can isolate specific dependencies into groups based on what 
submodules need them. The main disadvantage is that it would chain the 
giraph-formats-contrib as part of the main giraph.jar build, when it is infact 
not a dependency. 

To counter this I've built a subproject, managed by maven, that lives as a 
subdirectory within the main giraph trunk. It lists giraph as a jar dependency 
and builds standalone. It does not introduce any parent->child build 
relationships for the contrib module. Anyone who simply wishes to build the 
main giraph.jar will not see it. It does however require giraph.jar to be 
installed in the users local maven repo, at least until giraph is hosted in 
maven central or some other nexus.

Hope that explains it somewhat. If you guys would rather submodules or see a 
glaring issue with this approach, I'm happy to readjust. 
> HBase/Accumulo Input and Output formats
> ---------------------------------------
>                 Key: GIRAPH-153
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-153
>             Project: Giraph
>          Issue Type: New Feature
>          Components: bsp
>    Affects Versions: 0.1.0
>         Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
>            Reporter: Brian Femiano
> Four abstract classes that wrap their respective delegate input/output 
> formats for
> easy hooks into vertex input format subclasses. I've included some sample 
> programs that show two very simple graph
> algorithms. I have a graph generator that builds out a very simple directed 
> structure, starting with a few 'root' nodes.
> Root nodes are defined as nodes which are not listed as a child anywhere in 
> the graph. 
> Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. 
> Every vertex starts thinking it's a root. At superstep 0, send a message down 
> to each
> child as a non-root notification. After superstep 1, only root nodes will 
> have never been messaged. 
> Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
> bundling the notification logic followed by root node propagation. Once we've 
> marked the appropriate nodes as roots, tell every child which roots it can be 
> traced back to via one or more spanning trees. This will take N + 2 
> supersteps where N is the maximum number of hops from any root to any leaf, 
> plus 2 supersteps for the initial root flagging. 
> I've included all relevant code plus DistributedCacheHelper.java for 
> recursive cache file and archive searches. It is more hadoop centric than 
> giraph, but these jobs use it so I figured why not commit here. 
> These have been tested through local JobRunner, pseudo-distributed on the 
> aforementioned hardware, and full distributed on EC2. More details in the 
> comments.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to