Repository: incubator-madlib Updated Branches: refs/heads/master 4fcb60ed8 -> 029f73b15
Misc doc changes, mostly graph related Closes #150 Project: http://git-wip-us.apache.org/repos/asf/incubator-madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-madlib/commit/029f73b1 Tree: http://git-wip-us.apache.org/repos/asf/incubator-madlib/tree/029f73b1 Diff: http://git-wip-us.apache.org/repos/asf/incubator-madlib/diff/029f73b1 Branch: refs/heads/master Commit: 029f73b1567f50c012df71dea754a49125d6b049 Parents: 4fcb60e Author: Frank McQuillan <fmcquil...@pivotal.io> Authored: Fri Jul 14 13:38:02 2017 -0700 Committer: Orhan Kislal <okis...@pivotal.io> Committed: Thu Jul 20 15:41:39 2017 -0700 ---------------------------------------------------------------------- doc/mainpage.dox.in | 2 +- src/ports/postgres/modules/graph/apsp.sql_in | 15 +++++++----- src/ports/postgres/modules/graph/bfs.sql_in | 25 ++++++++++++-------- src/ports/postgres/modules/graph/sssp.sql_in | 2 +- .../recursive_partitioning/decision_tree.sql_in | 2 +- 5 files changed, 27 insertions(+), 19 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/029f73b1/doc/mainpage.dox.in ---------------------------------------------------------------------- diff --git a/doc/mainpage.dox.in b/doc/mainpage.dox.in index f64de15..de70d5d 100644 --- a/doc/mainpage.dox.in +++ b/doc/mainpage.dox.in @@ -129,7 +129,7 @@ complete matrix stored as a distributed table. @defgroup grp_apsp All Pairs Shortest Path @ingroup grp_graph - @defgroup grp_bfs Breadth-first Search + @defgroup grp_bfs Breadth-First Search @ingroup grp_graph @defgroup grp_pagerank PageRank http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/029f73b1/src/ports/postgres/modules/graph/apsp.sql_in ---------------------------------------------------------------------- diff --git a/src/ports/postgres/modules/graph/apsp.sql_in b/src/ports/postgres/modules/graph/apsp.sql_in index c8df70a..637afdf 100644 --- a/src/ports/postgres/modules/graph/apsp.sql_in +++ b/src/ports/postgres/modules/graph/apsp.sql_in @@ -46,12 +46,15 @@ The all pairs shortest paths (APSP) algorithm finds the length (summed weights) of the shortest paths between all pairs of vertices, such that the sum of the weights of the path edges is minimized. -@note APSP is an expensive algorithm for run-time +@warning APSP is an expensive algorithm for run-time because it finds the shortest path between all nodes -in the graph. The worst case run-time for this implementation -is O(V^2 * E) where V is the number of vertices and E is the -number of edges. In practice, run-time will be generally be -much less than this, depending on the graph. +in the graph. It is recommended that you start with a +small graph to get a sense of run-time for your use case, +then increase size carefully from there. The worst case run-time +for this implementation is O(V^2 * E) where V is the +number of vertices and E is the number of edges. In +practice, run-time will be generally be +much less than this, but it depends on the graph. @anchor apsp @par APSP @@ -112,7 +115,7 @@ table that keeps a record of the input parameters and is used by the path retrieval function described below. </dd> -<dt>grouping_cols</dt> +<dt>grouping_cols (optional)</dt> <dd>TEXT, default = NULL. List of columns used to group the input into discrete subgraphs. These columns must exist in the edge table. When this value is null, no grouping is used and a single APSP result is generated. </dd> http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/029f73b1/src/ports/postgres/modules/graph/bfs.sql_in ---------------------------------------------------------------------- diff --git a/src/ports/postgres/modules/graph/bfs.sql_in b/src/ports/postgres/modules/graph/bfs.sql_in index fd7a396..f4e8edc 100644 --- a/src/ports/postgres/modules/graph/bfs.sql_in +++ b/src/ports/postgres/modules/graph/bfs.sql_in @@ -23,7 +23,7 @@ * @brief SQL functions for graph analytics * @date Jun 2017 * - * @sa Provides Breadth First Search graph algorithm. + * @sa Provides a breadth first search graph algorithm. * *//* ----------------------------------------------------------------------- */ m4_include(`SQLCommon.m4') @@ -32,7 +32,7 @@ m4_include(`SQLCommon.m4') <div class="toc"><b>Contents</b> <ul> -<li><a href="#bfs">Breadth-first Search</a></li> +<li><a href="#bfs">Breadth-First Search</a></li> <li><a href="#notes">Notes</a></li> <li><a href="#examples">Examples</a></li> <li><a href="#literature">Literature</a></li> @@ -41,7 +41,7 @@ m4_include(`SQLCommon.m4') @brief Finds the nodes reachable from a given source vertex using a breadth-first approach. -Given a graph and a source vertex, the Breadth-first Search (BFS) algorithm +Given a graph and a source vertex, the breadth-first search (BFS) algorithm finds all nodes reachable from the source vertex by searching / traversing the graph in a breadth-first manner. @@ -84,7 +84,7 @@ the form "name=value". The following parameters are supported for this string argument: - src (INTEGER): Name of the column containing the source vertex ids in the edge table. Default column name is 'src'. - (Not to be confused with the source_vertex argument passed to the BFS function) + (This is not to be confused with the 'source_vertex' argument passed to the BFS function.) - dest (INTEGER): Name of the column containing the destination vertex ids in the edge table. Default column name is 'dest'. @@ -110,13 +110,18 @@ The output table will have the following columns (in addition to the grouping co A summary table named <out_table>_summary is also created. This is an internal table that keeps a record of the input parameters. </dd> -<dt>max_distance</dt> -<dd>INT, default = NULL. Maximum distance (number of edges) from source_vertex to search through in the graph.</dd> +<dt>max_distance (optional)</dt> +<dd>INT, default = NULL. Maximum distance to traverse +from the source vertex. When this value is null, +traverses until reaches leaf node. E.g., if set +to 1 will return only adjacent vertices, if set +to 7 will return vertices up to a maximum distance +of 7 vertices away. -<dt>directed</dt> +<dt>directed (optional)</dt> <dd>BOOLEAN, default = FALSE. If TRUE the graph will be treated as directed, else it will be treated as an undirected graph.</dd> -<dt>grouping_cols</dt> +<dt>grouping_cols (optional)</dt> <dd>TEXT, default = NULL. A comma-separated list of columns used to group the input into discrete subgraphs. These columns must exist in the edge table. When this value is NULL, no grouping is used @@ -128,8 +133,8 @@ and a single BFS result is generated. @anchor notes @par Notes -The graph_bfs function is a SQL implementation of the well-known Breadth-first -Search algorithm [1] modified appropriately for a relational database. It will +The graph_bfs function is a SQL implementation of the well-known breadth-first +search algorithm [1] modified appropriately for a relational database. It will find any node in the graph reachable from the source_vertex only once. If a node is reachable by many different paths from the source_vertex (i.e. has more than one parent), then only one of those parents is present in the output table. http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/029f73b1/src/ports/postgres/modules/graph/sssp.sql_in ---------------------------------------------------------------------- diff --git a/src/ports/postgres/modules/graph/sssp.sql_in b/src/ports/postgres/modules/graph/sssp.sql_in index fb0cdba..372f1fb 100644 --- a/src/ports/postgres/modules/graph/sssp.sql_in +++ b/src/ports/postgres/modules/graph/sssp.sql_in @@ -100,7 +100,7 @@ the following columns (in addition to the grouping columns): A summary table named <out_table>_summary is also created. This is an internal table that keeps a record of the input parameters and is used by the path function described below. </dd> -<dt>grouping_cols</dt> +<dt>grouping_cols (optional)</dt> <dd>TEXT, default = NULL. List of columns used to group the input into discrete subgraphs. These columns must exist in the edge table. When this value is null, no grouping is used and a single SSSP result is generated. </dd> </dl> http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/029f73b1/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in ---------------------------------------------------------------------- diff --git a/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in b/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in index 92a123d..e8f37f8 100644 --- a/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in +++ b/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in @@ -186,7 +186,7 @@ tree_train( </table> </DD> - <DT>surrogate_params</DT> + <DT>surrogate_params (optional)</DT> <DD>TEXT. Comma-separated string of key-value pairs controlling the behavior of surrogate splits for each node. A surrogate variable is another predictor variable that is associated (correlated) with the primary predictor variable