Hyunsik Choi commented on GIRAPH-11:
I'm sorry for delaying the review. Now, I'm digging your patch.
That looks great! Based on this work, we can consider some advanced graph
partitioner based on the number of edge-cuts on graph partitions.
I need about one more day for more investigation because the patch is somewhat
complicated for me :)
Besides, for the deeper review, I would like to execute the some tests and
trace them. Your patch needs the rebase. Could you rebase the patch?
Thank you :)
> Improve the graph distribution of Giraph
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
> Issue Type: Improvement
> Affects Versions: 0.70.0
> Reporter: Avery Ching
> Assignee: Avery Ching
> Attachments: GIRAPH-11.diff
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted.
> If the user data is not sorted by the vertex id, they must first run a
> MapReduce or Pig job to generate a sorted dataset. This is often a bit
> Giraph graph partitioning is currently range based and there are some
> advantages and disadvantages of this approach. The proposal of this JIRA
> would be to allow for both range and hash based partitioning and provide more
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
For more information on JIRA, see: http://www.atlassian.com/software/jira