spark git commit: [SPARK-18845][GRAPHX] PageRank has incorrect initialization value that leads to slow convergence

2016-12-15 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 172a52f5d -> 78062b852 [SPARK-18845][GRAPHX] PageRank has incorrect initialization value that leads to slow convergence ## What changes were proposed in this pull request? Change the initial value in all PageRank implementations to be

spark git commit: [SPARK-9436] [GRAPHX] Pregel simplification patch

2015-07-29 Thread ankurdave
) } ``` This can be simplified with one join. ankurdave proposed a patch based on our discussion in the mailing list: https://www.mail-archive.com/devspark.apache.org/msg10316.html Author: Alexander Ulanov na...@yandex.ru Closes #7749 from avulanov/SPARK-9436-pregel and squashes the following

spark git commit: [SPARK-9109] [GRAPHX] Keep the cached edge in the graph

2015-07-17 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master eba6a1af4 - 587c315b2 [SPARK-9109] [GRAPHX] Keep the cached edge in the graph The change here is to keep the cached RDDs in the graph object so that when the graph.unpersist() is called these RDDs are correctly unpersisted. ```java

spark git commit: [SPARK-9109] [GRAPHX] Keep the cached edge in the graph

2015-07-17 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.4 bb1401507 - f34f3d71f [SPARK-9109] [GRAPHX] Keep the cached edge in the graph The change here is to keep the cached RDDs in the graph object so that when the graph.unpersist() is called these RDDs are correctly unpersisted. ```java

spark git commit: [SPARK-8718] [GRAPHX] Improve EdgePartition2D for non perfect square number of partitions

2015-07-14 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master d267c2834 - 0a4071eab [SPARK-8718] [GRAPHX] Improve EdgePartition2D for non perfect square number of partitions See https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb Author: Andrew Ray ray.and...@gmail.com Closes #7104 from

spark git commit: [SPARK-6736][GraphX][Doc]Example of Graph#aggregateMessages has error

2015-04-07 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 6f0d55d76 - ae980eb41 [SPARK-6736][GraphX][Doc]Example of Graph#aggregateMessages has error Example of Graph#aggregateMessages has error. Since aggregateMessages is a method of Graph, It should be written rawGraph.aggregateMessages

spark git commit: [SPARK-6510][GraphX]: Add Graph#minus method to act as Set#difference

2015-03-26 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master aad003227 - 39fb57968 [SPARK-6510][GraphX]: Add Graph#minus method to act as Set#difference Adds a `Graph#minus` method which will return only unique `VertexId`'s from the calling `VertexRDD`. To demonstrate a basic example with

spark git commit: [SPARK-5922][GraphX]: Add diff(other: RDD[VertexId, VD]) in VertexRDD

2015-03-16 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master aa6536fa3 - 45f4c6612 [SPARK-5922][GraphX]: Add diff(other: RDD[VertexId, VD]) in VertexRDD Changed method invocation of 'diff' to match that of 'innerJoin' and 'leftJoin' from VertexRDD[VD] to RDD[(VertexId, VD)]. This change maintains

spark git commit: [SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharing

2015-02-25 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.3 eaffc6edd - 8073767f5 [SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharing Fixes the issue whereby when VertexRDD's are `diff`ed, `innerJoin`ed, or `leftJoin`ed and have different partition sizes they fail under the

spark git commit: [SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharing

2015-02-25 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master a777c65da - 9f603fce7 [SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharing Fixes the issue whereby when VertexRDD's are `diff`ed, `innerJoin`ed, or `leftJoin`ed and have different partition sizes they fail under the

spark git commit: [SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharing

2015-02-25 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.2 a9abcaa2c - 00112baf9 [SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharing Fixes the issue whereby when VertexRDD's are `diff`ed, `innerJoin`ed, or `leftJoin`ed and have different partition sizes they fail under the

spark git commit: SPARK-3290 [GRAPHX] No unpersist callls in SVDPlusPlus

2015-02-14 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.3 152147f5f - db5747921 SPARK-3290 [GRAPHX] No unpersist callls in SVDPlusPlus This just unpersist()s each RDD in this code that was cache()ed. Author: Sean Owen so...@cloudera.com Closes #4234 from srowen/SPARK-3290 and squashes the

spark git commit: [SPARK-5343][GraphX]: ShortestPaths traverses backwards

2015-02-10 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.3 bba095399 - 5be8902f7 [SPARK-5343][GraphX]: ShortestPaths traverses backwards Corrected the logic with ShortestPaths so that the calculation will run forward rather than backwards. Output before looked like: ```scala import

spark git commit: [SPARK-5343][GraphX]: ShortestPaths traverses backwards

2015-02-10 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master fd2c032f9 - 582096128 [SPARK-5343][GraphX]: ShortestPaths traverses backwards Corrected the logic with ShortestPaths so that the calculation will run forward rather than backwards. Output before looked like: ```scala import

spark git commit: [SPARK-5351][GraphX] Do not use Partitioner.defaultPartitioner as a partitioner of EdgeRDDImp...

2015-01-23 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master cef1f092a - e224dbb01 [SPARK-5351][GraphX] Do not use Partitioner.defaultPartitioner as a partitioner of EdgeRDDImp... If the value of 'spark.default.parallelism' does not match the number of partitoins in EdgePartition(EdgeRDDImpl), the

spark git commit: [SPARK-5351][GraphX] Do not use Partitioner.defaultPartitioner as a partitioner of EdgeRDDImp...

2015-01-23 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.2 2ea782a9d - 73cb806f7 [SPARK-5351][GraphX] Do not use Partitioner.defaultPartitioner as a partitioner of EdgeRDDImp... If the value of 'spark.default.parallelism' does not match the number of partitoins in EdgePartition(EdgeRDDImpl),

spark git commit: [SPARK-5064][GraphX] Add numEdges upperbound validation for R-MAT graph generator to prevent infinite loop

2015-01-21 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.2 e90f6b5c6 - 37db20c94 [SPARK-5064][GraphX] Add numEdges upperbound validation for R-MAT graph generator to prevent infinite loop I looked into GraphGenerators#chooseCell, and found that chooseCell can't generate more edges than

spark git commit: [SPARK-5064][GraphX] Add numEdges upperbound validation for R-MAT graph generator to prevent infinite loop

2015-01-21 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 7450a992b - 3ee3ab592 [SPARK-5064][GraphX] Add numEdges upperbound validation for R-MAT graph generator to prevent infinite loop I looked into GraphGenerators#chooseCell, and found that chooseCell can't generate more edges than pow(2, (2

spark git commit: [SPARK-4917] Add a function to convert into a graph with canonical edges in GraphOps

2015-01-08 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 8d45834de - f825e193f [SPARK-4917] Add a function to convert into a graph with canonical edges in GraphOps Convert bi-directional edges into uni-directional ones instead of 'canonicalOrientation' in GraphLoader.edgeListFile. This

spark git commit: [Minor] Fix comments for GraphX 2D partitioning strategy

2015-01-06 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master a6394bc2c - 5e3ec1110 [Minor] Fix comments for GraphX 2D partitioning strategy The sum of vertices on matrix (v0 to v11) is 12. And, I think one same block overlaps in this strategy. This is minor PR, so I didn't file in JIRA. Author:

spark git commit: [SPARK-4646] Replace Scala.util.Sorting.quickSort with Sorter(TimSort) in Spark

2014-12-07 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master e895e0cbe - 2e6b736b0 [SPARK-4646] Replace Scala.util.Sorting.quickSort with Sorter(TimSort) in Spark This patch just replaces a native quick sorter with Sorter(TimSort) in Spark. It could get performance gains by ~8% in my quick

spark git commit: [SPARK-4646] Replace Scala.util.Sorting.quickSort with Sorter(TimSort) in Spark

2014-12-07 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.2 27d9f13af - a4ae7c8b5 [SPARK-4646] Replace Scala.util.Sorting.quickSort with Sorter(TimSort) in Spark This patch just replaces a native quick sorter with Sorter(TimSort) in Spark. It could get performance gains by ~8% in my quick

spark git commit: [SPARK-4620] Add unpersist in Graph and GraphImpl

2014-12-07 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 2e6b736b0 - 8817fc7fe [SPARK-4620] Add unpersist in Graph and GraphImpl Add an IF to uncache both vertices and edges of Graph/GraphImpl. This IF is useful when iterative graph operations build a new graph in each iteration, and the

spark git commit: [SPARK-4620] Add unpersist in Graph and GraphImpl

2014-12-07 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.2 a4ae7c8b5 - 6b9e8b081 [SPARK-4620] Add unpersist in Graph and GraphImpl Add an IF to uncache both vertices and edges of Graph/GraphImpl. This IF is useful when iterative graph operations build a new graph in each iteration, and the

spark git commit: [SPARK-3623][GraphX] GraphX should support the checkpoint operation

2014-12-06 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 6eb1b6f62 - e895e0cbe [SPARK-3623][GraphX] GraphX should support the checkpoint operation Author: GuoQiang Li wi...@qq.com Closes #2631 from witgo/SPARK-3623 and squashes the following commits: a70c500 [GuoQiang Li] Remove java related

spark git commit: [SPARK-3623][GraphX] GraphX should support the checkpoint operation

2014-12-06 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.2 11446a648 - 27d9f13af [SPARK-3623][GraphX] GraphX should support the checkpoint operation Author: GuoQiang Li wi...@qq.com Closes #2631 from witgo/SPARK-3623 and squashes the following commits: a70c500 [GuoQiang Li] Remove java

spark git commit: [SPARK-4672][Core]Checkpoint() should clear f to shorten the serialization chain

2014-12-03 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 17c162f66 - 77be8b986 [SPARK-4672][Core]Checkpoint() should clear f to shorten the serialization chain The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672 The f closure of `PartitionsRDD(ZippedPartitionsRDD2)` contains a

spark git commit: [SPARK-4672][Core]Checkpoint() should clear f to shorten the serialization chain

2014-12-03 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.2 528cce8bc - 667f7ff44 [SPARK-4672][Core]Checkpoint() should clear f to shorten the serialization chain The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672 The f closure of `PartitionsRDD(ZippedPartitionsRDD2)`

spark git commit: [SPARK-4672][GraphX]Perform checkpoint() on PartitionsRDD to shorten the lineage

2014-12-02 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 5da21f07d - fc0a1475e [SPARK-4672][GraphX]Perform checkpoint() on PartitionsRDD to shorten the lineage The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672 Iterative GraphX applications always have long lineage, while

spark git commit: [SPARK-4672][GraphX]Perform checkpoint() on PartitionsRDD to shorten the lineage

2014-12-02 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.2 5e026a3e6 - f1859fc18 [SPARK-4672][GraphX]Perform checkpoint() on PartitionsRDD to shorten the lineage The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672 Iterative GraphX applications always have long lineage, while

spark git commit: [SPARK-4672][GraphX]Non-transient PartitionsRDDs will lead to StackOverflow error

2014-12-02 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master fc0a1475e - 17c162f66 [SPARK-4672][GraphX]Non-transient PartitionsRDDs will lead to StackOverflow error The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672 In a nutshell, if `val partitionsRDD` in EdgeRDDImpl and

spark git commit: [SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx

2014-11-06 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 23eaf0e12 - d15c6e9dc [SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx at first srcIds is not initialized and are all 0. so we use edgeArray(0).srcId to currSrcId Author: lianhuiwang lianhuiwan...@gmail.com Closes

spark git commit: [SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx

2014-11-06 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.2 aaaeaf939 - 9061bc4e1 [SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx at first srcIds is not initialized and are all 0. so we use edgeArray(0).srcId to currSrcId Author: lianhuiwang lianhuiwan...@gmail.com Closes

spark git commit: [SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx

2014-11-06 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.1 c58c1bb83 - 0a40eac25 [SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx at first srcIds is not initialized and are all 0. so we use edgeArray(0).srcId to currSrcId Author: lianhuiwang lianhuiwan...@gmail.com Closes

spark git commit: [SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx

2014-11-06 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.0 49224fd0f - 76c20cac9 [SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx at first srcIds is not initialized and are all 0. so we use edgeArray(0).srcId to currSrcId Author: lianhuiwang lianhuiwan...@gmail.com Closes

git commit: [graphX] GraphOps: random pick vertex bug

2014-09-29 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 0bbe7faef - 51229ff7f [graphX] GraphOps: random pick vertex bug When `numVertices 50`, probability is set to 0. This would cause infinite loop. Author: yingjieMiao ying...@42go.com Closes #2553 from yingjieMiao/graphx and squashes the

git commit: [SPARK-2062][GraphX] VertexRDD.apply does not use the mergeFunc

2014-09-19 Thread ankurdave
Xiao xia...@sjtu.edu.cn Author: Blie Arkansol xia...@sjtu.edu.cn Author: Ankur Dave ankurd...@gmail.com Closes #1903 from larryxiao/2062 and squashes the following commits: 625aa9d [Blie Arkansol] Merge pull request #1 from ankurdave/SPARK-2062 476770b [Ankur Dave] ShippableVertexPartition.initFrom

git commit: [HOTFIX] [SPARK-3400] Revert 9b225ac fix GraphX EdgeRDD zipPartitions

2014-09-04 Thread ankurdave
. Author: Ankur Dave ankurd...@gmail.com Closes #2271 from ankurdave/SPARK-3400 and squashes the following commits: 10c2a97 [Ankur Dave] [HOTFIX] [SPARK-3400] Revert 9b225ac fix GraphX EdgeRDD zipPartitions (cherry picked from commit 00362dac976cd05b06638deb11d990d612429e0b) Signed-off-by: Ankur

git commit: [HOTFIX] [SPARK-3400] Revert 9b225ac fix GraphX EdgeRDD zipPartitions

2014-09-04 Thread ankurdave
. Author: Ankur Dave ankurd...@gmail.com Closes #2271 from ankurdave/SPARK-3400 and squashes the following commits: 10c2a97 [Ankur Dave] [HOTFIX] [SPARK-3400] Revert 9b225ac fix GraphX EdgeRDD zipPartitions Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip

git commit: [HOTFIX] [SPARK-3400] Revert 9b225ac fix GraphX EdgeRDD zipPartitions

2014-09-04 Thread ankurdave
. Author: Ankur Dave ankurd...@gmail.com Closes #2271 from ankurdave/SPARK-3400 and squashes the following commits: 10c2a97 [Ankur Dave] [HOTFIX] [SPARK-3400] Revert 9b225ac fix GraphX EdgeRDD zipPartitions (cherry picked from commit 00362dac976cd05b06638deb11d990d612429e0b) Signed-off-by: Ankur

git commit: [SPARK-3263][GraphX] Fix changes made to GraphGenerator.logNormalGraph in PR #720

2014-09-03 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 6481d2742 - e5d376801 [SPARK-3263][GraphX] Fix changes made to GraphGenerator.logNormalGraph in PR #720 PR #720 made multiple changes to GraphGenerator.logNormalGraph including: * Replacing the call to functions for generating random

git commit: [SPARK-1986][GraphX]move lib.Analytics to org.apache.spark.examples

2014-09-02 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 644e31524 - 7c92b49d6 [SPARK-1986][GraphX]move lib.Analytics to org.apache.spark.examples to support ~/spark/bin/run-example GraphXAnalytics triangles /soc-LiveJournal1.txt --numEPart=256 Author: Larry Xiao xia...@sjtu.edu.cn Closes

git commit: [SPARK-3123][GraphX]: override the setName function to set EdgeRDD's name manually just as VertexRDD does.

2014-09-02 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 7c92b49d6 - 7c9bbf172 [SPARK-3123][GraphX]: override the setName function to set EdgeRDD's name manually just as VertexRDD does. Author: uncleGen husty...@gmail.com Closes #2033 from uncleGen/master_origin and squashes the following

git commit: [SPARK-2981][GraphX] EdgePartition1D Int overflow

2014-09-02 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 7c9bbf172 - aa7de128c [SPARK-2981][GraphX] EdgePartition1D Int overflow minor fix detail is here: https://issues.apache.org/jira/browse/SPARK-2981 Author: Larry Xiao xia...@sjtu.edu.cn Closes #1902 from larryxiao/2981 and squashes the

git commit: [SPARK-2981][GraphX] EdgePartition1D Int overflow

2014-09-02 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.1 7267e402c - 9b0cff2d4 [SPARK-2981][GraphX] EdgePartition1D Int overflow minor fix detail is here: https://issues.apache.org/jira/browse/SPARK-2981 Author: Larry Xiao xia...@sjtu.edu.cn Closes #1902 from larryxiao/2981 and squashes

git commit: [SPARK-2981][GraphX] EdgePartition1D Int overflow

2014-09-02 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.0 5481196ab - d60f60ccc [SPARK-2981][GraphX] EdgePartition1D Int overflow minor fix detail is here: https://issues.apache.org/jira/browse/SPARK-2981 Author: Larry Xiao xia...@sjtu.edu.cn Closes #1902 from larryxiao/2981 and squashes

git commit: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPartitions

2014-09-02 Thread ankurdave
Repository: spark Updated Branches: refs/heads/branch-1.1 0c8183cb3 - ffdb2fcf8 [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPartitions If the users set “spark.default.parallelism” and the value is different with the EdgeRDD partition number, GraphX jobs will throw:

git commit: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPartitions

2014-09-02 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master e9bb12bea - 9b225ac30 [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPartitions If the users set “spark.default.parallelism” and the value is different with the EdgeRDD partition number, GraphX jobs will throw:

svn commit: r1607545 - in /spark: images/graphx-perf-comparison.png site/images/graphx-perf-comparison.png

2014-07-03 Thread ankurdave
Author: ankurdave Date: Thu Jul 3 07:08:46 2014 New Revision: 1607545 URL: http://svn.apache.org/r1607545 Log: Correct the GraphX performance comparison graphic Modified: spark/images/graphx-perf-comparison.png spark/site/images/graphx-perf-comparison.png Modified: spark/images/graphx

git commit: Minor: Fix documentation error from apache/spark#946

2014-06-04 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master 11ded3f66 - abea2d4ff Minor: Fix documentation error from apache/spark#946 Author: Ankur Dave ankurd...@gmail.com Closes #970 from ankurdave/SPARK-1991_docfix and squashes the following commits: 6d07343 [Ankur Dave] Minor: Fix

git commit: Synthetic GraphX Benchmark

2014-06-03 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master aa41a522d - 894ecde04 Synthetic GraphX Benchmark This PR accomplishes two things: 1. It introduces a Synthetic Benchmark application that generates an arbitrarily large log-normal graph and executes either PageRank or connected

git commit: Enable repartitioning of graph over different number of partitions

2014-06-03 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master e8d93ee52 - 5284ca78d Enable repartitioning of graph over different number of partitions It is currently very difficult to repartition a graph over a different number of partitions. This PR adds an additional `partitionBy` function that

git commit: initial version of LPA

2014-05-29 Thread ankurdave
: 327aee0 [haroldsultan] Merge pull request #2 from ankurdave/label-propagation 227a4d0 [Ankur Dave] Untabify 0ac574c [haroldsultan] Merge pull request #1 from ankurdave/label-propagation 0e24303 [Ankur Dave] Add LabelPropagationSuite 84aa061 [Ankur Dave] LabelPropagation: Fix compile errors and style