Github user jingyimei commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/244#discussion_r177910146
  
    --- Diff: src/ports/postgres/modules/graph/pagerank.py_in ---
    @@ -211,19 +261,30 @@ def pagerank(schema_madlib, vertex_table, vertex_id, 
edge_table, edge_args,
                     distinct_grp_table, grouping_cols_list)
                 # Find number of vertices in each group, this is the 
normalizing factor
                 # for computing the random_prob
    +            where_clause_ppr = ''
    +            if nodes_of_interest > 0:
    +                where_clause_ppr = """where __vertices__ = 
ANY(ARRAY{nodes_of_interest})""".format(
    --- End diff --
    
    After consulting with QP, `__vertices__ = ANY(ARRAY{nodes_of_interest})` 
works exactly the same as `__vertices__ in (nodes_of_interest)`, this may look 
simpler.  
    
    Besides, since we use this condition in multiple places, I am wondering if 
a join clause is faster - we create a temp table that saves special node ids 
and we join this temp table with vertex table by vertex id - QP suggested to 
try both and see which one runs faster.


---

Reply via email to