[ https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Avery Ching updated GIRAPH-11: ------------------------------ Affects Version/s: 0.70.0 > Improve the graph distribution of Giraph > ---------------------------------------- > > Key: GIRAPH-11 > URL: https://issues.apache.org/jira/browse/GIRAPH-11 > Project: Giraph > Issue Type: Improvement > Affects Versions: 0.70.0 > Reporter: Avery Ching > Assignee: Avery Ching > > Currently, Giraph assumes that the data from the VertexInputFormat is sorted. > If the user data is not sorted by the vertex id, they must first run a > MapReduce or Pig job to generate a sorted dataset. This is often a bit > inconvenient. > Giraph graph partitioning is currently range based and there are some > advantages and disadvantages of this approach. The proposal of this JIRA > would be to allow for both range and hash based partitioning and provide more > flexibility to the user. > Design goals for the graph distribution: > * Allow vertices to be unordered or unordered > * Ability to repartition > * Select the partitioning scheme based on user needs (i.e. hash or range > based) > * Ability to provide user-specific hints about partitions > Hash-based partitioning > * Good vertex balancing across ranges for random data > * Bad at vertex id locality > Range-based partitioning > * Good at vertex id locality > * Ability to split ranges easily > * Can cause hotspots for hot ranges -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira