Improve the graph distribution of Giraph
Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching
Currently, Giraph assumes that the data from the VertexInputFormat is sorted.
If the user data is not sorted by the vertex id, they must first run a
MapReduce or Pig job to generate a sorted dataset. This is often a bit
Giraph graph partitioning is currently range based and there are some
advantages and disadvantages of this approach. The proposal of this JIRA would
be to allow for both range and hash based partitioning and provide more
flexibility to the user.
Design goals for the graph distribution:
* Allow vertices to be unordered or unordered
* Ability to repartition
* Select the partitioning scheme based on user needs (i.e. hash or range based)
* Ability to provide user-specific hints about partitions
* Good vertex balancing across ranges for random data
* Bad at vertex id locality
* Good at vertex id locality
* Ability to split ranges easily
* Can cause hotspots for hot ranges
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira