kaknikhil commented on code in PR #594:
URL: https://github.com/apache/madlib/pull/594#discussion_r1114882309


##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -47,14 +48,20 @@ from graph_utils import validate_output_and_summary_tables
 
 def validate_wcc_args(schema_madlib, vertex_table, vertex_id, edge_table,
                       edge_params, out_table, out_table_summary,
-                      grouping_cols_list, module_name):
+                      grouping_cols_list, warm_start, out_table_message, 
module_name):
     """
     Function to validate input parameters for wcc
     """
     validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
-                          out_table, module_name)
-    _assert(not table_exists(out_table_summary),
-            "Graph {module_name}: Output summary table already 
exists!".format(**locals()))
+                          out_table, module_name, warm_start)
+    if not warm_start:
+        _assert(not table_exists(out_table_summary),

Review Comment:
   Maybe we could also assert that the msg table doesn't exist for non warm 
start scenarios ?



##########
src/ports/postgres/modules/graph/graph_utils.py_in:
##########
@@ -74,14 +74,18 @@ def validate_output_and_summary_tables(model_out_table, 
module_name,
                 "Graph WCC: Output table {0} already 
exists.".format(out_table))
 
 def validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
-                          out_table, func_name, **kwargs):
+                          out_table, func_name, warm_start = False, **kwargs):
     """
     Validates graph tables (vertex and edge) as well as the output table.
     """
     _assert(out_table and out_table.strip().lower() not in ('null', ''),
-            "Graph {func_name}: Invalid output table name!".format(**locals()))
-    _assert(not table_exists(out_table),
-            "Graph {func_name}: Output table already 
exists!".format(**locals()))
+                "Graph {func_name}: Invalid output table 
name!".format(**locals()))
+    if warm_start:
+        _assert(table_exists(out_table),

Review Comment:
   Shouldn't we also assert that the msg table exists ?



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -47,14 +48,20 @@ from graph_utils import validate_output_and_summary_tables
 
 def validate_wcc_args(schema_madlib, vertex_table, vertex_id, edge_table,
                       edge_params, out_table, out_table_summary,
-                      grouping_cols_list, module_name):
+                      grouping_cols_list, warm_start, out_table_message, 
module_name):
     """
     Function to validate input parameters for wcc
     """
     validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
-                          out_table, module_name)
-    _assert(not table_exists(out_table_summary),
-            "Graph {module_name}: Output summary table already 
exists!".format(**locals()))
+                          out_table, module_name, warm_start)
+    if not warm_start:
+        _assert(not table_exists(out_table_summary),
+                "Graph {module_name}: Output summary table already 
exists!".format(**locals()))
+    else:
+        _assert(table_exists(out_table_summary),
+                "Graph {module_name}: Output summary table is missing for warm 
start!".format(**locals()))
+        _assert(table_exists(out_table_message),
+                "Graph {module_name}: Output message table is missing for warm 
start! Either the wcc was completed in the last run or the table got 
dropped/renamed.".format(**locals()))

Review Comment:
   s/the wcc/wcc



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -211,6 +227,14 @@ def wcc(schema_madlib, vertex_table, vertex_id, 
edge_table, edge_args,
     else:
         edge_inverse = edge_table
 
+    if warm_start:
+        out_table_sql = ""
+        msg_sql = ""
+        if vertex_type != "BIGINT[]" and vertex_id_in and vertex_id_in != 'id':

Review Comment:
   We do the same check later again, I'm assuming we need both ?



##########
src/ports/postgres/modules/graph/test/wcc.sql_in:
##########
@@ -276,3 +276,12 @@ SELECT graph_wcc_num_cpts(
 SELECT assert(relative_error(num_components, 3) < 0.00001,
         'Weakly Connected Components: Incorrect largest component value.'
     ) FROM count_table WHERE user_id1=1;
+
+DROP TABLE IF EXISTS wcc_warm_start_out, wcc_warm_start_out_summary;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', 
'user_id', 2);

Review Comment:
   1. Can there be any issues if the user upgrades madlib and then uses the new 
function with warm start set to true on the previous out tables ? We might have 
to explicitly call this out as well
   2. We should also add a test case for this scenario (This is based on the 
assumption that it takes 5 iterations to update all the nodes)
       * Run 2 iterations without warm start
       * Run 2 more with warm start set to true
       * Run 1 or 2 more with warm start set to true.
       For the first two runs, we should at the very least assert that 
nodes_to_update > 0 and for the final run, we should assert the contents of the 
out table and also that nodes_to_update = 0
   3. There are still tables leftover after each warm start wcc run(with or 
without grouping). I think it might be a good idea to add a test to make sure 
that there aren't any extra tables left behind
   4. Another scenario to consider:
       * Let's say it takes 10 iterations to update all the nodes in the graph
       * User runs wcc with iteration limit of 2
       * User runs wcc with iteration limit of 3 and warm start
       * User runs wcc with iteration limit of 3 and warm start but removes the 
grouping col
   What is the ideal behavior here ? (same applies if we start with no grouping 
col and then start using one before all the nodes have been updated)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@madlib.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to