Github user jingyimei commented on a diff in the pull request:
https://github.com/apache/madlib/pull/244#discussion_r177910146
--- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
@@ -211,19 +261,30 @@ def pagerank(schema_madlib, vertex_table, vertex_id,
edge_table, edge_args,
distinct_grp_table, grouping_cols_list)
# Find number of vertices in each group, this is the
normalizing factor
# for computing the random_prob
+ where_clause_ppr = ''
+ if nodes_of_interest > 0:
+ where_clause_ppr = """where __vertices__ =
ANY(ARRAY{nodes_of_interest})""".format(
--- End diff --
After consulting with QP, `__vertices__ = ANY(ARRAY{nodes_of_interest})`
works exactly the same as `__vertices__ in (nodes_of_interest)`, this may look
simpler.
Besides, since we use this condition in multiple places, I am wondering if
a join clause is faster - we create a temp table that saves special node ids
and we join this temp table with vertex table by vertex id - QP suggested to
try both and see which one runs faster.
---