Re: [igraph] R help: Interpretation of PageRank scores for different setups(weighted/mutligraph)

Tamas Nepusz Wed, 30 Nov 2016 00:40:00 -0800

Hi,

Sorry for the late response. Most likely there is a bug in how PRPACK
treats directed graphs; in theory, there should be no difference between
the ARPACK and PRPACK implementations (apart from the latter being faster
and more stable), both should treat directed=F and as.undirected() the same
way. I was briefly looking at the conversion between an igraph graph and
PRPACK's internal data structures in the source code, but I could not find
anything yet that could explain the difference; feel free to take a look
yourself as well:


https://github.com/igraph/igraph/blob/master/src/prpack/prpack_igraph_graph.cpp

I have added an issue to Github but I don't think I will be able to get
around to fixing this any time in the near future, so please chime in if
you managed to figure out anything in the meanwhile:

https://github.com/igraph/igraph/issues/972


T.

On Sat, Nov 19, 2016 at 11:59 AM, danopiacek . <[email protected]>
wrote:

> Hello to everyone,
>
> First of all, I would like to thank the authors of these amazing package.
> I use it extensively in my master thesis.
> I am sorry for duplicate if there already exists same thread.
>
> I have a huge network with couple of millions of edges and would like to
> calculate personalized PageRank. My network is attributed and I would like
> to use edge weights to calculate PageRank. I can either consider my network
> as directed or undirected, but in both cases my network is multigraph. I
> have encountered several 'strange' results when calculating weighted
> PageRank.
>
> Here I will try to present it on simple networks without 'personalization'.
>
> 1a.) Example with directed symmetric network, with no multiple edges and
> no weights.
>     A = matrix(c(0,1,1,1,0,0,1,0,0),byrow=T,nrow=3)
>     g = graph.adjacency(A)
>
> Then I compute PageRank with page_rank function and get the same result
> for both "PRPACK" and "ARPACK" algo. I can also compute via matrix inverse
> as follows:
>    D = diag(1/apply(A,1,sum),nrow(A)); Beta =
> (1-0.85)*rep(1/nrow(A),nrow(A))
>     (pr.in = Beta%*%solve(diag(1,nrow(A))-0.85*D%*%A))
> And as expected I get the same PageRank score. In addition, if I consider
> network as undirected I get the same scores, as expected.
>
> 1b.) The same example with edge weights.
>
>     set.seed(1)
>     E(g)$weight = round(runif(ecount(g)),1)
> And again, using page_rank I get the same scores for both algorithms. Or I
> can get the same scores with weight matrix *W* as:
>     W = matrix(c(0,0.3,0.4,0.6,0,0,0.9,0,0),byrow=T,nrow=3)
>     D = diag(1/apply(W,1,sum),nrow(W)); Beta =
> (1-0.85)*rep(1/nrow(W),nrow(W))
>     (pr.in = Beta%*%solve(diag(1,nrow(W))-0.85*D%*%W))
> However, if I consider this network being undirected, I get different
> scores, namely:
>     g_UN = as.undirected(g,'each') # do not collapse
>     (prUn1 = page_rank(g,algo='PRPACK', directed = F)$vector)
>     (prUn2 = page_rank(g,algo='ARPACK', directed = F)$vector)
>     (prUn3 = page_rank(g_UN, algo='PRPACK')$vector)
>     (prUn4 = page_rank(g_UN, algo='ARPACK')$vector)
> Here scores given by 'ARPACK' are different to those given by 'PRPACK'.
> Moreover, with 'ARPACK' I get same scores for both *g_UN*(undirected
> network) and for *directed=F*. However 'PRPACK' gives different scores.
> I can replicate results for 'ARPACK' with weight matrix, where I take
> sum/average of edge weights:
>     W_UN = matrix(c(0,0.9,1.3,0.9,0,0,1.3,0,0),byrow=T,nrow=3)
>     D = diag(1/apply(W_UN,1,sum),nrow(W_UN))
>     (pr.in = Beta%*%solve(diag(1,nrow(W_UN))-0.85*D%*%W_UN))
>
> 2a.) Example with directed multigraph.
>
>     A2 = matrix(c(0,2,1,1,0,0,1,1,0),byrow=T,nrow=3)
>     g2 = graph.adjacency(A2)
> Without considering edge weights both algorithms give the same scores,
> which can be also found as
>     D = diag(1/apply(A2,1,sum),nrow(A2))
>     (pr.in = Beta%*%solve(diag(1,nrow(A2))-0.85*D%*%A2))
>
> 2b.) Same graph as undirected.
> Here again both algorithms give the same score for both *directed=F* and
> *as.undirected(g2,'each')*, or:
>     A2_UN = matrix(c(0,3,2,3,0,1,2,1,0),byrow=T,nrow=3)
>     D = diag(1/apply(A2_UN,1,sum),nrow(A2_UN))
>     (pr.in = Beta%*%solve(diag(1,nrow(A2_UN))-0.85*D%*%A2_UN))
> gives the same scores.
>
> 2c.) Same graph with edge weights-directed.
>     set.seed(1)
>     E(g2)$weight = round(runif(ecount(g2)),1)
> And again I get different scores
>     page_rank(g2,algo='PRPACK')$vector
>     page_rank(g2,algo='ARPACK')$vector
> Where 'ARPACK' gives the same scores as with weight matrix *W* where edge
> weights for multiple
> edges(considering direction) are summed
>     W2 = matrix(c(0,0.7,0.6,0.9,0,0,0.2,0.9,0),byrow=T,nrow=3)
>     D = diag(1/apply(W2,1,sum),nrow(W2))
>     (pr.in = Beta%*%solve(diag(1,nrow(W2))-0.85*D%*%W2))
>
> 2d.) Same graph with edge weights-undirected.
>     g2_UN = as.undirected(g2,'each') # do not collapse
> Again, 'PRPACK' and 'ARPACK' give different scores. Moreover, 'PRPACK'
> differs for
>     (prUn1 = page_rank(g2,algo='PRPACK',directed = F)$vector)
>     (prUN2 = page_rank(g2_UN,algo='PRPACK')$vector)
> , while 'ARPACK' gives same scores for both. Or with weight matrix *W *I
> can replicate 'ARPACK' by summing edge weights:
>     W2_UN = matrix(c(0,1.6,0.8,1.6,0,0.9,0.8,0.9,0),nrow=3,byrow = T)
>     D = diag(1/apply(W2_UN,1,sum),nrow(W2_UN))
>     (pr.in = Beta%*%solve(diag(1,nrow(W2_UN))-0.85*D%*%W2_UN))
>
> I have examined .c files (centrality.c, arpack.c) and I know that 'ARPACK'
> treats *directed=F* and *as.undirected() *same.
>
> So the main questions are, how does 'PRPACK' treat these two options and
> moreover why 'PRPACK' gives different results than 'ARPACK'/weight matrix.
>
> Thank you very much.
> Daniel Piacek
> 
>
> 
>
> _______________________________________________
> igraph-help mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/igraph-help
>
>

_______________________________________________
igraph-help mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/igraph-help

Re: [igraph] R help: Interpretation of PageRank scores for different setups(weighted/mutligraph)

Reply via email to