This is an automated email from the ASF dual-hosted git repository.

fmcquillan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git


The following commit(s) were added to refs/heads/master by this push:
     new 874d189  add comment to graph user docs to distribute edge table by 
source vertex id
874d189 is described below

commit 874d1892c5e35436c6e5bfc46ad9983a6587b159
Author: Frank McQuillan <fmcquil...@pivotal.io>
AuthorDate: Fri May 17 14:10:30 2019 -0700

    add comment to graph user docs to distribute edge table by source vertex id
---
 src/ports/postgres/modules/graph/apsp.sql_in     | 2 ++
 src/ports/postgres/modules/graph/bfs.sql_in      | 3 +++
 src/ports/postgres/modules/graph/hits.sql_in     | 3 +++
 src/ports/postgres/modules/graph/pagerank.sql_in | 3 +++
 src/ports/postgres/modules/graph/sssp.sql_in     | 3 +++
 src/ports/postgres/modules/graph/wcc.sql_in      | 5 +++--
 6 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/ports/postgres/modules/graph/apsp.sql_in 
b/src/ports/postgres/modules/graph/apsp.sql_in
index c7bf210..7cd77d3 100644
--- a/src/ports/postgres/modules/graph/apsp.sql_in
+++ b/src/ports/postgres/modules/graph/apsp.sql_in
@@ -55,6 +55,8 @@ for this implementation is O(V^2 * E) where V is the
 number of vertices and E is the number of edges.  In
 practice, run-time will be generally be
 much less than this, but it depends on the graph.
+On a Greenplum cluster, the edge table should be distributed
+by the source vertex id column for better performance.
 
 @anchor apsp
 @par APSP
diff --git a/src/ports/postgres/modules/graph/bfs.sql_in 
b/src/ports/postgres/modules/graph/bfs.sql_in
index c1c27fe..ea991fa 100644
--- a/src/ports/postgres/modules/graph/bfs.sql_in
+++ b/src/ports/postgres/modules/graph/bfs.sql_in
@@ -130,6 +130,9 @@ and a single BFS result is generated.
 
 </dl>
 
+@note On a Greenplum cluster, the edge table should be distributed
+by the source vertex id column for better performance.
+
 @anchor notes
 @par Notes
 
diff --git a/src/ports/postgres/modules/graph/hits.sql_in 
b/src/ports/postgres/modules/graph/hits.sql_in
index 96a507c..83f838d 100644
--- a/src/ports/postgres/modules/graph/hits.sql_in
+++ b/src/ports/postgres/modules/graph/hits.sql_in
@@ -127,6 +127,9 @@ parameter.
 
 </dl>
 
+@note On a Greenplum cluster, the edge table should be distributed
+by the source vertex id column for better performance.
+
 @anchor notes
 @par Notes
 
diff --git a/src/ports/postgres/modules/graph/pagerank.sql_in 
b/src/ports/postgres/modules/graph/pagerank.sql_in
index b81b58e..cd239bd 100644
--- a/src/ports/postgres/modules/graph/pagerank.sql_in
+++ b/src/ports/postgres/modules/graph/pagerank.sql_in
@@ -132,6 +132,9 @@ for personalized PageRank. When this parameter is provided, 
personalized PageRan
 will run.  In the absence of this parameter, regular PageRank will run.
 </dl>
 
+@note On a Greenplum cluster, the edge table should be distributed
+by the source vertex id column for better performance.
+
 @anchor examples
 @examp
 
diff --git a/src/ports/postgres/modules/graph/sssp.sql_in 
b/src/ports/postgres/modules/graph/sssp.sql_in
index 372f1fb..8175624 100644
--- a/src/ports/postgres/modules/graph/sssp.sql_in
+++ b/src/ports/postgres/modules/graph/sssp.sql_in
@@ -104,6 +104,9 @@ A summary table named <out_table>_summary is also created. 
This is an internal t
 <dd>TEXT, default = NULL. List of columns used to group the input into 
discrete subgraphs. These columns must exist in the edge table. When this value 
is null, no grouping is used and a single SSSP result is generated. </dd>
 </dl>
 
+@note On a Greenplum cluster, the edge table should be distributed
+by the source vertex id column for better performance.
+
 @par Path Retrieval
 
 The path retrieval function returns the shortest path from the
diff --git a/src/ports/postgres/modules/graph/wcc.sql_in 
b/src/ports/postgres/modules/graph/wcc.sql_in
index 1c3808b..bc6ce7a 100644
--- a/src/ports/postgres/modules/graph/wcc.sql_in
+++ b/src/ports/postgres/modules/graph/wcc.sql_in
@@ -115,8 +115,9 @@ weakly connected components are generated for all data
 
 </dl>
 
-@note On Greenplum cluster, the edge table should be distributed on the src
-column for better performance. In addition, the user should note that this
+@note On a Greenplum cluster, the edge table should be distributed
+by the source vertex id column for better performance.
+In addition, the user should note that this
 function creates a duplicate of the edge table (on Greenplum cluster) for
 better performance.
 

Reply via email to