[jira] [Comment Edited] (MADLIB-1102) Graph - Breadth First Search / Traversal

Frank McQuillan (JIRA) Fri, 19 May 2017 14:32:15 -0700

    [ 
https://issues.apache.org/jira/browse/MADLIB-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018038#comment-16018038
 ]


Frank McQuillan edited comment on MADLIB-1102 at 5/19/17 9:30 PM:
------------------------------------------------------------------

Here is my suggestion on the interface for the initial version in 1.12.   It is 
a simple first implementation that we can build on for later.  Follow on JIRA 
that adds more capability
https://issues.apache.org/jira/browse/MADLIB-1105

Below uses SSSP as a model
http://madlib.incubator.apache.org/docs/latest/group__grp__sssp.html

{code}
graph_bfs (
        vertex_table,    
        vertex_id,       
        edge_table,      
        edge_args,       
        source_vertex, 
        out_table, 
        max_distance,
        directed,
        grouping_cols
)
{code}

Arguments

vertex_table
TEXT. Name of the table containing the vertex data for the graph. Must contain 
the column specified in the 'vertex_id' parameter below.

vertex_id
TEXT, default = 'id'. Name of the column in 'vertex_table' containing vertex 
ids. The vertex ids are of type INTEGER with no duplicates. They do not need to 
be contiguous.

edge_table
TEXT. Name of the table containing the edge data. The edge table must contain 
columns for source vertex, destination vertex and edge weight. Column naming 
convention is described below in the 'edge_args' parameter.

edge_args
TEXT. A comma-delimited string containing multiple named arguments of the form 
"name=value". The following parameters are supported for this string argument:
* src (INTEGER): Name of the column containing the source vertex ids in the 
edge table. Default column name is 'src'.
* dest (INTEGER): Name of the column containing the destination vertex ids in 
the edge table. Default column name is 'dest'.
* weight (FLOAT8): Name of the column containing the edge weights in the edge 
table. Default column name is 'weight'.

source_vertex
INTEGER. The source vertex id for the traversal to start. This vertex id must 
exist in the 'vertex_id' column of 'vertex_table'.

out_table
TEXT. Name of the table to store the result of BFS. It contains a row for every 
vertex visited of every group and has the following columns (in addition to the 
grouping columns):
* vertex_id : The id for the vertex visited. Will use the input parameter 
'vertex_id' for column naming.
* distance:  The distance of the vertex with respect to the source vertex.  
E.g., 1 means it is an adjacent vertex.

max_distance (optional)
        INTEGER, default = NULL.  Distance to traverse from the source vertex.  
When this value is null, traverses until reaches leaf node.  E.g., if set to 1 
will return only adjacent vertices, if set to 7 will return vertices up to a 
maximum distance of 7 vertices away.

directed (optional)
        BOOLEAN, default = FALSE meaning graph is treated as undirected.  If 
TRUE, graph is treated as directed.

grouping_cols (optional)
TEXT, default = NULL. List of columns used to group the input into discrete 
subgraphs. These columns must exist in the edge table. When this value is null, 
no grouping is used and a single BFS result is generated.



was (Author: fmcquillan):
Here is my suggestion on the interface for the initial version in 1.12.   It is 
a simple first implementation that we can build on for later.

Follow on JIRA is here
https://issues.apache.org/jira/browse/MADLIB-1105

Below uses SSSP as a model
http://madlib.incubator.apache.org/docs/latest/group__grp__sssp.html

{code}
graph_bfs (
        vertex_table,    
        vertex_id,       
        edge_table,      
        edge_args,       
        source_vertex, 
        out_table, 
        max_distance,
        directed,
        grouping_cols
)
{code}

Arguments

vertex_table
TEXT. Name of the table containing the vertex data for the graph. Must contain 
the column specified in the 'vertex_id' parameter below.

vertex_id
TEXT, default = 'id'. Name of the column in 'vertex_table' containing vertex 
ids. The vertex ids are of type INTEGER with no duplicates. They do not need to 
be contiguous.

edge_table
TEXT. Name of the table containing the edge data. The edge table must contain 
columns for source vertex, destination vertex and edge weight. Column naming 
convention is described below in the 'edge_args' parameter.

edge_args
TEXT. A comma-delimited string containing multiple named arguments of the form 
"name=value". The following parameters are supported for this string argument:
* src (INTEGER): Name of the column containing the source vertex ids in the 
edge table. Default column name is 'src'.
* dest (INTEGER): Name of the column containing the destination vertex ids in 
the edge table. Default column name is 'dest'.
* weight (FLOAT8): Name of the column containing the edge weights in the edge 
table. Default column name is 'weight'.

source_vertex
INTEGER. The source vertex id for the traversal to start. This vertex id must 
exist in the 'vertex_id' column of 'vertex_table'.

out_table
TEXT. Name of the table to store the result of BFS. It contains a row for every 
vertex visited of every group and has the following columns (in addition to the 
grouping columns):
* vertex_id : The id for the vertex visited. Will use the input parameter 
'vertex_id' for column naming.
* distance:  The distance of the vertex with respect to the source vertex.  
E.g., 1 means it is an adjacent vertex.

max_distance (optional)
        INTEGER, default = NULL.  Distance to traverse from the source vertex.  
When this value is null, traverses until reaches leaf node.  E.g., if set to 1 
will return only adjacent vertices, if set to 7 will return vertices up to a 
maximum distance of 7 vertices away.

directed (optional)
        BOOLEAN, default = FALSE meaning graph is treated as undirected.  If 
TRUE, graph is treated as directed.

grouping_cols (optional)
TEXT, default = NULL. List of columns used to group the input into discrete 
subgraphs. These columns must exist in the edge table. When this value is null, 
no grouping is used and a single BFS result is generated.


> Graph - Breadth First Search / Traversal
> ----------------------------------------
>
>                 Key: MADLIB-1102
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1102
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Graph
>            Reporter: Rashmi Raghu
>            Assignee: Rashmi Raghu
>             Fix For: v1.12
>
>
> Story
> As a MADlib user and developer, I want to implement Breadth First Search / 
> Traversal for a graph. BFS is also a core part of the connected components 
> graph algorithm.
> Accpetance:
> 1) Interface defined
> 2) Design doc updated
> 3) Documentation and on-line help
> 4) IC and functional tests
> 5) Scale tests
> References:
> [0] [https://en.wikipedia.org/wiki/Breadth-first_search] 
> "Breadth-first search (BFS) is an algorithm for traversing or searching tree 
> or graph data structures. It starts at the tree root (or some arbitrary node 
> of a graph, sometimes referred to as a 'search key'[1]) and explores the 
> neighbor nodes first, before moving to the next level neighbors."
> [1] [http://www.geeksforgeeks.org/breadth-first-traversal-for-a-graph/]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (MADLIB-1102) Graph - Breadth First Search / Traversal

Reply via email to