This is an automated email from the ASF dual-hosted git repository.
fmcquillan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git
The following commit(s) were added to refs/heads/master by this push:
new 874d189 add comment to graph user docs to distribute edge table by
source vertex id
874d189 is described below
commit 874d1892c5e35436c6e5bfc46ad9983a6587b159
Author: Frank McQuillan <[email protected]>
AuthorDate: Fri May 17 14:10:30 2019 -0700
add comment to graph user docs to distribute edge table by source vertex id
---
src/ports/postgres/modules/graph/apsp.sql_in | 2 ++
src/ports/postgres/modules/graph/bfs.sql_in | 3 +++
src/ports/postgres/modules/graph/hits.sql_in | 3 +++
src/ports/postgres/modules/graph/pagerank.sql_in | 3 +++
src/ports/postgres/modules/graph/sssp.sql_in | 3 +++
src/ports/postgres/modules/graph/wcc.sql_in | 5 +++--
6 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/src/ports/postgres/modules/graph/apsp.sql_in
b/src/ports/postgres/modules/graph/apsp.sql_in
index c7bf210..7cd77d3 100644
--- a/src/ports/postgres/modules/graph/apsp.sql_in
+++ b/src/ports/postgres/modules/graph/apsp.sql_in
@@ -55,6 +55,8 @@ for this implementation is O(V^2 * E) where V is the
number of vertices and E is the number of edges. In
practice, run-time will be generally be
much less than this, but it depends on the graph.
+On a Greenplum cluster, the edge table should be distributed
+by the source vertex id column for better performance.
@anchor apsp
@par APSP
diff --git a/src/ports/postgres/modules/graph/bfs.sql_in
b/src/ports/postgres/modules/graph/bfs.sql_in
index c1c27fe..ea991fa 100644
--- a/src/ports/postgres/modules/graph/bfs.sql_in
+++ b/src/ports/postgres/modules/graph/bfs.sql_in
@@ -130,6 +130,9 @@ and a single BFS result is generated.
</dl>
+@note On a Greenplum cluster, the edge table should be distributed
+by the source vertex id column for better performance.
+
@anchor notes
@par Notes
diff --git a/src/ports/postgres/modules/graph/hits.sql_in
b/src/ports/postgres/modules/graph/hits.sql_in
index 96a507c..83f838d 100644
--- a/src/ports/postgres/modules/graph/hits.sql_in
+++ b/src/ports/postgres/modules/graph/hits.sql_in
@@ -127,6 +127,9 @@ parameter.
</dl>
+@note On a Greenplum cluster, the edge table should be distributed
+by the source vertex id column for better performance.
+
@anchor notes
@par Notes
diff --git a/src/ports/postgres/modules/graph/pagerank.sql_in
b/src/ports/postgres/modules/graph/pagerank.sql_in
index b81b58e..cd239bd 100644
--- a/src/ports/postgres/modules/graph/pagerank.sql_in
+++ b/src/ports/postgres/modules/graph/pagerank.sql_in
@@ -132,6 +132,9 @@ for personalized PageRank. When this parameter is provided,
personalized PageRan
will run. In the absence of this parameter, regular PageRank will run.
</dl>
+@note On a Greenplum cluster, the edge table should be distributed
+by the source vertex id column for better performance.
+
@anchor examples
@examp
diff --git a/src/ports/postgres/modules/graph/sssp.sql_in
b/src/ports/postgres/modules/graph/sssp.sql_in
index 372f1fb..8175624 100644
--- a/src/ports/postgres/modules/graph/sssp.sql_in
+++ b/src/ports/postgres/modules/graph/sssp.sql_in
@@ -104,6 +104,9 @@ A summary table named <out_table>_summary is also created.
This is an internal t
<dd>TEXT, default = NULL. List of columns used to group the input into
discrete subgraphs. These columns must exist in the edge table. When this value
is null, no grouping is used and a single SSSP result is generated. </dd>
</dl>
+@note On a Greenplum cluster, the edge table should be distributed
+by the source vertex id column for better performance.
+
@par Path Retrieval
The path retrieval function returns the shortest path from the
diff --git a/src/ports/postgres/modules/graph/wcc.sql_in
b/src/ports/postgres/modules/graph/wcc.sql_in
index 1c3808b..bc6ce7a 100644
--- a/src/ports/postgres/modules/graph/wcc.sql_in
+++ b/src/ports/postgres/modules/graph/wcc.sql_in
@@ -115,8 +115,9 @@ weakly connected components are generated for all data
</dl>
-@note On Greenplum cluster, the edge table should be distributed on the src
-column for better performance. In addition, the user should note that this
+@note On a Greenplum cluster, the edge table should be distributed
+by the source vertex id column for better performance.
+In addition, the user should note that this
function creates a duplicate of the edge table (on Greenplum cluster) for
better performance.