Repository: spark
Updated Branches:
refs/heads/master 172a52f5d -> 78062b852
[SPARK-18845][GRAPHX] PageRank has incorrect initialization value that leads to
slow convergence
## What changes were proposed in this pull request?
Change the initial value in all PageRank implementations to be
) }
```
This can be simplified with one join. ankurdave proposed a patch based on our
discussion in the mailing list:
https://www.mail-archive.com/devspark.apache.org/msg10316.html
Author: Alexander Ulanov na...@yandex.ru
Closes #7749 from avulanov/SPARK-9436-pregel and squashes the following
Repository: spark
Updated Branches:
refs/heads/master eba6a1af4 - 587c315b2
[SPARK-9109] [GRAPHX] Keep the cached edge in the graph
The change here is to keep the cached RDDs in the graph object so that when the
graph.unpersist() is called these RDDs are correctly unpersisted.
```java
Repository: spark
Updated Branches:
refs/heads/branch-1.4 bb1401507 - f34f3d71f
[SPARK-9109] [GRAPHX] Keep the cached edge in the graph
The change here is to keep the cached RDDs in the graph object so that when the
graph.unpersist() is called these RDDs are correctly unpersisted.
```java
Repository: spark
Updated Branches:
refs/heads/master d267c2834 - 0a4071eab
[SPARK-8718] [GRAPHX] Improve EdgePartition2D for non perfect square number of
partitions
See https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb
Author: Andrew Ray ray.and...@gmail.com
Closes #7104 from
Repository: spark
Updated Branches:
refs/heads/master 6f0d55d76 - ae980eb41
[SPARK-6736][GraphX][Doc]Example of Graph#aggregateMessages has error
Example of Graph#aggregateMessages has error.
Since aggregateMessages is a method of Graph, It should be written
rawGraph.aggregateMessages
Repository: spark
Updated Branches:
refs/heads/master aad003227 - 39fb57968
[SPARK-6510][GraphX]: Add Graph#minus method to act as Set#difference
Adds a `Graph#minus` method which will return only unique `VertexId`'s from the
calling `VertexRDD`.
To demonstrate a basic example with
Repository: spark
Updated Branches:
refs/heads/master aa6536fa3 - 45f4c6612
[SPARK-5922][GraphX]: Add diff(other: RDD[VertexId, VD]) in VertexRDD
Changed method invocation of 'diff' to match that of 'innerJoin' and 'leftJoin'
from VertexRDD[VD] to RDD[(VertexId, VD)]. This change maintains
Repository: spark
Updated Branches:
refs/heads/branch-1.3 eaffc6edd - 8073767f5
[SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharing
Fixes the issue whereby when VertexRDD's are `diff`ed, `innerJoin`ed, or
`leftJoin`ed and have different partition sizes they fail under the
Repository: spark
Updated Branches:
refs/heads/master a777c65da - 9f603fce7
[SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharing
Fixes the issue whereby when VertexRDD's are `diff`ed, `innerJoin`ed, or
`leftJoin`ed and have different partition sizes they fail under the
Repository: spark
Updated Branches:
refs/heads/branch-1.2 a9abcaa2c - 00112baf9
[SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharing
Fixes the issue whereby when VertexRDD's are `diff`ed, `innerJoin`ed, or
`leftJoin`ed and have different partition sizes they fail under the
Repository: spark
Updated Branches:
refs/heads/branch-1.3 152147f5f - db5747921
SPARK-3290 [GRAPHX] No unpersist callls in SVDPlusPlus
This just unpersist()s each RDD in this code that was cache()ed.
Author: Sean Owen so...@cloudera.com
Closes #4234 from srowen/SPARK-3290 and squashes the
Repository: spark
Updated Branches:
refs/heads/branch-1.3 bba095399 - 5be8902f7
[SPARK-5343][GraphX]: ShortestPaths traverses backwards
Corrected the logic with ShortestPaths so that the calculation will run forward
rather than backwards. Output before looked like:
```scala
import
Repository: spark
Updated Branches:
refs/heads/master fd2c032f9 - 582096128
[SPARK-5343][GraphX]: ShortestPaths traverses backwards
Corrected the logic with ShortestPaths so that the calculation will run forward
rather than backwards. Output before looked like:
```scala
import
Repository: spark
Updated Branches:
refs/heads/master cef1f092a - e224dbb01
[SPARK-5351][GraphX] Do not use Partitioner.defaultPartitioner as a partitioner
of EdgeRDDImp...
If the value of 'spark.default.parallelism' does not match the number of
partitoins in EdgePartition(EdgeRDDImpl),
the
Repository: spark
Updated Branches:
refs/heads/branch-1.2 2ea782a9d - 73cb806f7
[SPARK-5351][GraphX] Do not use Partitioner.defaultPartitioner as a partitioner
of EdgeRDDImp...
If the value of 'spark.default.parallelism' does not match the number of
partitoins in EdgePartition(EdgeRDDImpl),
Repository: spark
Updated Branches:
refs/heads/branch-1.2 e90f6b5c6 - 37db20c94
[SPARK-5064][GraphX] Add numEdges upperbound validation for R-MAT graph
generator to prevent infinite loop
I looked into GraphGenerators#chooseCell, and found that chooseCell can't
generate more edges than
Repository: spark
Updated Branches:
refs/heads/master 7450a992b - 3ee3ab592
[SPARK-5064][GraphX] Add numEdges upperbound validation for R-MAT graph
generator to prevent infinite loop
I looked into GraphGenerators#chooseCell, and found that chooseCell can't
generate more edges than pow(2, (2
Repository: spark
Updated Branches:
refs/heads/master 8d45834de - f825e193f
[SPARK-4917] Add a function to convert into a graph with canonical edges in
GraphOps
Convert bi-directional edges into uni-directional ones instead of
'canonicalOrientation' in GraphLoader.edgeListFile.
This
Repository: spark
Updated Branches:
refs/heads/master a6394bc2c - 5e3ec1110
[Minor] Fix comments for GraphX 2D partitioning strategy
The sum of vertices on matrix (v0 to v11) is 12. And, I think one same block
overlaps in this strategy.
This is minor PR, so I didn't file in JIRA.
Author:
Repository: spark
Updated Branches:
refs/heads/master e895e0cbe - 2e6b736b0
[SPARK-4646] Replace Scala.util.Sorting.quickSort with Sorter(TimSort) in Spark
This patch just replaces a native quick sorter with Sorter(TimSort) in Spark.
It could get performance gains by ~8% in my quick
Repository: spark
Updated Branches:
refs/heads/branch-1.2 27d9f13af - a4ae7c8b5
[SPARK-4646] Replace Scala.util.Sorting.quickSort with Sorter(TimSort) in Spark
This patch just replaces a native quick sorter with Sorter(TimSort) in Spark.
It could get performance gains by ~8% in my quick
Repository: spark
Updated Branches:
refs/heads/master 2e6b736b0 - 8817fc7fe
[SPARK-4620] Add unpersist in Graph and GraphImpl
Add an IF to uncache both vertices and edges of Graph/GraphImpl.
This IF is useful when iterative graph operations build a new graph in each
iteration, and the
Repository: spark
Updated Branches:
refs/heads/branch-1.2 a4ae7c8b5 - 6b9e8b081
[SPARK-4620] Add unpersist in Graph and GraphImpl
Add an IF to uncache both vertices and edges of Graph/GraphImpl.
This IF is useful when iterative graph operations build a new graph in each
iteration, and the
Repository: spark
Updated Branches:
refs/heads/master 6eb1b6f62 - e895e0cbe
[SPARK-3623][GraphX] GraphX should support the checkpoint operation
Author: GuoQiang Li wi...@qq.com
Closes #2631 from witgo/SPARK-3623 and squashes the following commits:
a70c500 [GuoQiang Li] Remove java related
Repository: spark
Updated Branches:
refs/heads/branch-1.2 11446a648 - 27d9f13af
[SPARK-3623][GraphX] GraphX should support the checkpoint operation
Author: GuoQiang Li wi...@qq.com
Closes #2631 from witgo/SPARK-3623 and squashes the following commits:
a70c500 [GuoQiang Li] Remove java
Repository: spark
Updated Branches:
refs/heads/master 17c162f66 - 77be8b986
[SPARK-4672][Core]Checkpoint() should clear f to shorten the serialization chain
The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672
The f closure of `PartitionsRDD(ZippedPartitionsRDD2)` contains a
Repository: spark
Updated Branches:
refs/heads/branch-1.2 528cce8bc - 667f7ff44
[SPARK-4672][Core]Checkpoint() should clear f to shorten the serialization chain
The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672
The f closure of `PartitionsRDD(ZippedPartitionsRDD2)`
Repository: spark
Updated Branches:
refs/heads/master 5da21f07d - fc0a1475e
[SPARK-4672][GraphX]Perform checkpoint() on PartitionsRDD to shorten the lineage
The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672
Iterative GraphX applications always have long lineage, while
Repository: spark
Updated Branches:
refs/heads/branch-1.2 5e026a3e6 - f1859fc18
[SPARK-4672][GraphX]Perform checkpoint() on PartitionsRDD to shorten the lineage
The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672
Iterative GraphX applications always have long lineage, while
Repository: spark
Updated Branches:
refs/heads/master fc0a1475e - 17c162f66
[SPARK-4672][GraphX]Non-transient PartitionsRDDs will lead to StackOverflow
error
The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672
In a nutshell, if `val partitionsRDD` in EdgeRDDImpl and
Repository: spark
Updated Branches:
refs/heads/master 23eaf0e12 - d15c6e9dc
[SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx
at first srcIds is not initialized and are all 0. so we use edgeArray(0).srcId
to currSrcId
Author: lianhuiwang lianhuiwan...@gmail.com
Closes
Repository: spark
Updated Branches:
refs/heads/branch-1.2 aaaeaf939 - 9061bc4e1
[SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx
at first srcIds is not initialized and are all 0. so we use edgeArray(0).srcId
to currSrcId
Author: lianhuiwang lianhuiwan...@gmail.com
Closes
Repository: spark
Updated Branches:
refs/heads/branch-1.1 c58c1bb83 - 0a40eac25
[SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx
at first srcIds is not initialized and are all 0. so we use edgeArray(0).srcId
to currSrcId
Author: lianhuiwang lianhuiwan...@gmail.com
Closes
Repository: spark
Updated Branches:
refs/heads/branch-1.0 49224fd0f - 76c20cac9
[SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx
at first srcIds is not initialized and are all 0. so we use edgeArray(0).srcId
to currSrcId
Author: lianhuiwang lianhuiwan...@gmail.com
Closes
Repository: spark
Updated Branches:
refs/heads/master 0bbe7faef - 51229ff7f
[graphX] GraphOps: random pick vertex bug
When `numVertices 50`, probability is set to 0. This would cause infinite
loop.
Author: yingjieMiao ying...@42go.com
Closes #2553 from yingjieMiao/graphx and squashes the
Xiao xia...@sjtu.edu.cn
Author: Blie Arkansol xia...@sjtu.edu.cn
Author: Ankur Dave ankurd...@gmail.com
Closes #1903 from larryxiao/2062 and squashes the following commits:
625aa9d [Blie Arkansol] Merge pull request #1 from ankurdave/SPARK-2062
476770b [Ankur Dave] ShippableVertexPartition.initFrom
.
Author: Ankur Dave ankurd...@gmail.com
Closes #2271 from ankurdave/SPARK-3400 and squashes the following commits:
10c2a97 [Ankur Dave] [HOTFIX] [SPARK-3400] Revert 9b225ac fix GraphX EdgeRDD
zipPartitions
(cherry picked from commit 00362dac976cd05b06638deb11d990d612429e0b)
Signed-off-by: Ankur
.
Author: Ankur Dave ankurd...@gmail.com
Closes #2271 from ankurdave/SPARK-3400 and squashes the following commits:
10c2a97 [Ankur Dave] [HOTFIX] [SPARK-3400] Revert 9b225ac fix GraphX EdgeRDD
zipPartitions
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip
.
Author: Ankur Dave ankurd...@gmail.com
Closes #2271 from ankurdave/SPARK-3400 and squashes the following commits:
10c2a97 [Ankur Dave] [HOTFIX] [SPARK-3400] Revert 9b225ac fix GraphX EdgeRDD
zipPartitions
(cherry picked from commit 00362dac976cd05b06638deb11d990d612429e0b)
Signed-off-by: Ankur
Repository: spark
Updated Branches:
refs/heads/master 6481d2742 - e5d376801
[SPARK-3263][GraphX] Fix changes made to GraphGenerator.logNormalGraph in PR
#720
PR #720 made multiple changes to GraphGenerator.logNormalGraph including:
* Replacing the call to functions for generating random
Repository: spark
Updated Branches:
refs/heads/master 644e31524 - 7c92b49d6
[SPARK-1986][GraphX]move lib.Analytics to org.apache.spark.examples
to support ~/spark/bin/run-example GraphXAnalytics triangles
/soc-LiveJournal1.txt --numEPart=256
Author: Larry Xiao xia...@sjtu.edu.cn
Closes
Repository: spark
Updated Branches:
refs/heads/master 7c92b49d6 - 7c9bbf172
[SPARK-3123][GraphX]: override the setName function to set EdgeRDD's name
manually just as VertexRDD does.
Author: uncleGen husty...@gmail.com
Closes #2033 from uncleGen/master_origin and squashes the following
Repository: spark
Updated Branches:
refs/heads/master 7c9bbf172 - aa7de128c
[SPARK-2981][GraphX] EdgePartition1D Int overflow
minor fix
detail is here: https://issues.apache.org/jira/browse/SPARK-2981
Author: Larry Xiao xia...@sjtu.edu.cn
Closes #1902 from larryxiao/2981 and squashes the
Repository: spark
Updated Branches:
refs/heads/branch-1.1 7267e402c - 9b0cff2d4
[SPARK-2981][GraphX] EdgePartition1D Int overflow
minor fix
detail is here: https://issues.apache.org/jira/browse/SPARK-2981
Author: Larry Xiao xia...@sjtu.edu.cn
Closes #1902 from larryxiao/2981 and squashes
Repository: spark
Updated Branches:
refs/heads/branch-1.0 5481196ab - d60f60ccc
[SPARK-2981][GraphX] EdgePartition1D Int overflow
minor fix
detail is here: https://issues.apache.org/jira/browse/SPARK-2981
Author: Larry Xiao xia...@sjtu.edu.cn
Closes #1902 from larryxiao/2981 and squashes
Repository: spark
Updated Branches:
refs/heads/branch-1.1 0c8183cb3 - ffdb2fcf8
[SPARK-2823][GraphX]fix GraphX EdgeRDD zipPartitions
If the users set âspark.default.parallelismâ and the value is different
with the EdgeRDD partition number, GraphX jobs will throw:
Repository: spark
Updated Branches:
refs/heads/master e9bb12bea - 9b225ac30
[SPARK-2823][GraphX]fix GraphX EdgeRDD zipPartitions
If the users set âspark.default.parallelismâ and the value is different
with the EdgeRDD partition number, GraphX jobs will throw:
Author: ankurdave
Date: Thu Jul 3 07:08:46 2014
New Revision: 1607545
URL: http://svn.apache.org/r1607545
Log:
Correct the GraphX performance comparison graphic
Modified:
spark/images/graphx-perf-comparison.png
spark/site/images/graphx-perf-comparison.png
Modified: spark/images/graphx
Repository: spark
Updated Branches:
refs/heads/master 11ded3f66 - abea2d4ff
Minor: Fix documentation error from apache/spark#946
Author: Ankur Dave ankurd...@gmail.com
Closes #970 from ankurdave/SPARK-1991_docfix and squashes the following commits:
6d07343 [Ankur Dave] Minor: Fix
Repository: spark
Updated Branches:
refs/heads/master aa41a522d - 894ecde04
Synthetic GraphX Benchmark
This PR accomplishes two things:
1. It introduces a Synthetic Benchmark application that generates an
arbitrarily large log-normal graph and executes either PageRank or connected
Repository: spark
Updated Branches:
refs/heads/master e8d93ee52 - 5284ca78d
Enable repartitioning of graph over different number of partitions
It is currently very difficult to repartition a graph over a different number
of partitions. This PR adds an additional `partitionBy` function that
:
327aee0 [haroldsultan] Merge pull request #2 from ankurdave/label-propagation
227a4d0 [Ankur Dave] Untabify
0ac574c [haroldsultan] Merge pull request #1 from ankurdave/label-propagation
0e24303 [Ankur Dave] Add LabelPropagationSuite
84aa061 [Ankur Dave] LabelPropagation: Fix compile errors and style
53 matches
Mail list logo