[ https://issues.apache.org/jira/browse/SPARK-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Larry Xiao updated SPARK-2981: ------------------------------ Description: In PartitionStrategy.scala a PartitionID is calculated by multiplying VertexId with a mixingPrime (1125899906842597L) then cast to Int, and mod numParts. The Long is overflowed, and when cast to Int: {quote} scala> (1125899906842597L*1).toInt res1: Int = -27 scala> (1125899906842597L*2).toInt res2: Int = -54 scala> (1125899906842597L*3).toInt res3: Int = -81 {quote} As the cast produce number that are multiplies of 3, the partition is not useable when partitioning to multiples of 3. for example when you partition to 6 or 9 parts: {quote} 14/08/12 09:26:21 INFO GraphXPartition: GRAPHX: psrc Array((0,4347084), (1,0), (2,0), (3,3832578), (4,0), (5,0)) 14/08/12 09:26:21 INFO GraphXPartition: GRAPHX: pdst Array((0,4347084), (1,0), (2,0), (3,3832578), (4,0), (5,0)) 14/08/12 09:21:46 INFO GraphXPartition: GRAPHX: psrc Array((0,8179662), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0)) 14/08/12 09:21:46 INFO GraphXPartition: GRAPHX: pdst Array((0,8179662), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0)) so the vertices are partitioned to 0,3 for 6; and 0 for 9 {quote} I think solution is to cast after mod. {quote} scala> (1125899906842597L*3) res4: Long = 3377699720527791 scala> (1125899906842597L*3) % 9 res5: Long = 3 scala> ((1125899906842597L*3) % 9).toInt res5: Int = 3 {quote} was: In PartitionStrategy.scala a PartitionID is calculated by multiplying VertexId with a mixingPrime (1125899906842597L) then cast to Int, and mod numParts. The Long is overflowed, and when cast to Int: {quote} scala> (1125899906842597L*1).toInt res1: Int = -27 scala> (1125899906842597L*2).toInt res2: Int = -54 scala> (1125899906842597L*3).toInt res3: Int = -81 {quote} As the cast produce number that are multiplies of 3, the partition is not useable when partitioning to multiples of 3. for example when you partition to 6 or 9 parts: {quote} 14/08/12 09:26:21 INFO GraphXPartition: GRAPHX: psrc Array((0,4347084), (1,0), (2,0), (3,3832578), (4,0), (5,0)) 14/08/12 09:26:21 INFO GraphXPartition: GRAPHX: pdst Array((0,4347084), (1,0), (2,0), (3,3832578), (4,0), (5,0)) 14/08/12 09:21:46 INFO GraphXPartition: GRAPHX: psrc Array((0,8179662), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0)) 14/08/12 09:21:46 INFO GraphXPartition: GRAPHX: pdst Array((0,8179662), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0)) {quote} I think solution is to cast after mod. {quote} scala> (1125899906842597L*3) res4: Long = 3377699720527791 scala> (1125899906842597L*3) % 9 res5: Long = 3 scala> ((1125899906842597L*3) % 9).toInt res5: Int = 3 {quote} > PartitionStrategy: VertexID hash overflow > ----------------------------------------- > > Key: SPARK-2981 > URL: https://issues.apache.org/jira/browse/SPARK-2981 > Project: Spark > Issue Type: Bug > Components: GraphX > Affects Versions: 1.0.2 > Reporter: Larry Xiao > Labels: newbie > Original Estimate: 1h > Remaining Estimate: 1h > > In PartitionStrategy.scala a PartitionID is calculated by multiplying > VertexId with a mixingPrime (1125899906842597L) then cast to Int, and mod > numParts. > The Long is overflowed, and when cast to Int: > {quote} > scala> (1125899906842597L*1).toInt > res1: Int = -27 > scala> (1125899906842597L*2).toInt > res2: Int = -54 > scala> (1125899906842597L*3).toInt > res3: Int = -81 > {quote} > As the cast produce number that are multiplies of 3, the partition is not > useable when partitioning to multiples of 3. > for example when you partition to 6 or 9 parts: > {quote} > 14/08/12 09:26:21 INFO GraphXPartition: GRAPHX: psrc Array((0,4347084), > (1,0), (2,0), (3,3832578), (4,0), (5,0)) > 14/08/12 09:26:21 INFO GraphXPartition: GRAPHX: pdst Array((0,4347084), > (1,0), (2,0), (3,3832578), (4,0), (5,0)) > 14/08/12 09:21:46 INFO GraphXPartition: GRAPHX: psrc Array((0,8179662), > (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0)) > 14/08/12 09:21:46 INFO GraphXPartition: GRAPHX: pdst Array((0,8179662), > (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0)) > so the vertices are partitioned to 0,3 for 6; and 0 for 9 > {quote} > I think solution is to cast after mod. > {quote} > scala> (1125899906842597L*3) > res4: Long = 3377699720527791 > scala> (1125899906842597L*3) % 9 > res5: Long = 3 > scala> ((1125899906842597L*3) % 9).toInt > res5: Int = 3 > {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org