Avery Ching commented on GIRAPH-11:

Regarding the difference in hash based and hash rang based, it refers to how 
the hash code is assigned to a partition.  The application dev will implement 
hashCode() for their vertex id and then the assignment of the hashCode() to a 
partition can be hashed (i.e. hashCode() % # partitions) or range based 
([0-a),[a-b)...etc).  Hope that's more clear.  Code will help.  It's coming 
soon, by mid next week I hope.

> Improve the graph distribution of Giraph
> ----------------------------------------
>                 Key: GIRAPH-11
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-11
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to