Re: [igraph] R help: Interpretation of PageRank scores for different setups(weighted/mutligraph)

danopiacek . Thu, 01 Dec 2016 01:14:35 -0800

Hi Tamas,

So, I have tried several types of graphs and *'PRPACK', 'ARPACK'* and I
also wrote simple power iteration in R. Here are some observations so far.
1.) For small networks *'PRPACK'* seems to have problem to deal with
combination of edge weights and *directed=F / as.undirected(g, 'each')*.
One would expect that PageRank for directed=F / as.undirected(g,'each') /
as.undirected(g)
should be the same, however only the last option(which sums edge weights)
gives the same PageRank as *'ARPACK'* or power iteration(with weight matrix
instead of adjacency, where weights are summed).
2.) Another example is again small network, *directed *and *weighted*, but
with multiple edges(considering directions). Then '* PRPACK'* gives
different scores then other methods. And moreover if it's at the same time
personalized version, *'PRPACK'* scores doesn't add up to 1! Only after
simplification *simplify(g, edge.attr.comb = 'sum')* it coincides with
other two methods.
3.) If I create network with 'sink' nodes, *'ARPACK'* (directed) seems to
have problem for personalized version of PageRank.
4.) There also seems to be problem with *as.undirected(g) *vs.
*as.undirected(g,'each')*. Please see attached .R file with sample
network(bipartite) and demonstration of this behavior. So maybe that's the
reason why 'PRPACK' gives strange scores?



###
Now when I move from these small sample networks to my huge weighted,
directed network(millions of links) I got bit different results.
1.) If network contains loops,  *'ARPACK'* seems to give wrong scores.
2.) Again surprisingly problem no.* 2.)* from previous part, is no longer
present.

So for now I have to stick with the power method. I'll find out something
related to this problem, I'll let you know.

With regards,
Daniel

2016-11-30 9:38 GMT+01:00 Tamas Nepusz <[email protected]>:

> Hi,
>
> Sorry for the late response. Most likely there is a bug in how PRPACK
> treats directed graphs; in theory, there should be no difference between
> the ARPACK and PRPACK implementations (apart from the latter being faster
> and more stable), both should treat directed=F and as.undirected() the same
> way. I was briefly looking at the conversion between an igraph graph and
> PRPACK's internal data structures in the source code, but I could not find
> anything yet that could explain the difference; feel free to take a look
> yourself as well:
>
> https://github.com/igraph/igraph/blob/master/src/prpack/
> prpack_igraph_graph.cpp
>
> I have added an issue to Github but I don't think I will be able to get
> around to fixing this any time in the near future, so please chime in if
> you managed to figure out anything in the meanwhile:
>
> https://github.com/igraph/igraph/issues/972
>
>
> T.
>
> On Sat, Nov 19, 2016 at 11:59 AM, danopiacek . <[email protected]>
> wrote:
>
>> Hello to everyone,
>>
>> First of all, I would like to thank the authors of these amazing package.
>> I use it extensively in my master thesis.
>> I am sorry for duplicate if there already exists same thread.
>>
>> I have a huge network with couple of millions of edges and would like to
>> calculate personalized PageRank. My network is attributed and I would like
>> to use edge weights to calculate PageRank. I can either consider my network
>> as directed or undirected, but in both cases my network is multigraph. I
>> have encountered several 'strange' results when calculating weighted
>> PageRank.
>>
>> Here I will try to present it on simple networks without
>> 'personalization'.
>>
>> 1a.) Example with directed symmetric network, with no multiple edges and
>> no weights.
>>     A = matrix(c(0,1,1,1,0,0,1,0,0),byrow=T,nrow=3)
>>     g = graph.adjacency(A)
>>
>> Then I compute PageRank with page_rank function and get the same result
>> for both "PRPACK" and "ARPACK" algo. I can also compute via matrix inverse
>> as follows:
>>    D = diag(1/apply(A,1,sum),nrow(A)); Beta =
>> (1-0.85)*rep(1/nrow(A),nrow(A))
>>     (pr.in = Beta%*%solve(diag(1,nrow(A))-0.85*D%*%A))
>> And as expected I get the same PageRank score. In addition, if I consider
>> network as undirected I get the same scores, as expected.
>>
>> 1b.) The same example with edge weights.
>>
>>     set.seed(1)
>>     E(g)$weight = round(runif(ecount(g)),1)
>> And again, using page_rank I get the same scores for both algorithms. Or
>> I can get the same scores with weight matrix *W* as:
>>     W = matrix(c(0,0.3,0.4,0.6,0,0,0.9,0,0),byrow=T,nrow=3)
>>     D = diag(1/apply(W,1,sum),nrow(W)); Beta =
>> (1-0.85)*rep(1/nrow(W),nrow(W))
>>     (pr.in = Beta%*%solve(diag(1,nrow(W))-0.85*D%*%W))
>> However, if I consider this network being undirected, I get different
>> scores, namely:
>>     g_UN = as.undirected(g,'each') # do not collapse
>>     (prUn1 = page_rank(g,algo='PRPACK', directed = F)$vector)
>>     (prUn2 = page_rank(g,algo='ARPACK', directed = F)$vector)
>>     (prUn3 = page_rank(g_UN, algo='PRPACK')$vector)
>>     (prUn4 = page_rank(g_UN, algo='ARPACK')$vector)
>> Here scores given by 'ARPACK' are different to those given by 'PRPACK'.
>> Moreover, with 'ARPACK' I get same scores for both *g_UN*(undirected
>> network) and for *directed=F*. However 'PRPACK' gives different scores.
>> I can replicate results for 'ARPACK' with weight matrix, where I take
>> sum/average of edge weights:
>>     W_UN = matrix(c(0,0.9,1.3,0.9,0,0,1.3,0,0),byrow=T,nrow=3)
>>     D = diag(1/apply(W_UN,1,sum),nrow(W_UN))
>>     (pr.in = Beta%*%solve(diag(1,nrow(W_UN))-0.85*D%*%W_UN))
>>
>> 2a.) Example with directed multigraph.
>>
>>     A2 = matrix(c(0,2,1,1,0,0,1,1,0),byrow=T,nrow=3)
>>     g2 = graph.adjacency(A2)
>> Without considering edge weights both algorithms give the same scores,
>> which can be also found as
>>     D = diag(1/apply(A2,1,sum),nrow(A2))
>>     (pr.in = Beta%*%solve(diag(1,nrow(A2))-0.85*D%*%A2))
>>
>> 2b.) Same graph as undirected.
>> Here again both algorithms give the same score for both *directed=F* and
>> *as.undirected(g2,'each')*, or:
>>     A2_UN = matrix(c(0,3,2,3,0,1,2,1,0),byrow=T,nrow=3)
>>     D = diag(1/apply(A2_UN,1,sum),nrow(A2_UN))
>>     (pr.in = Beta%*%solve(diag(1,nrow(A2_UN))-0.85*D%*%A2_UN))
>> gives the same scores.
>>
>> 2c.) Same graph with edge weights-directed.
>>     set.seed(1)
>>     E(g2)$weight = round(runif(ecount(g2)),1)
>> And again I get different scores
>>     page_rank(g2,algo='PRPACK')$vector
>>     page_rank(g2,algo='ARPACK')$vector
>> Where 'ARPACK' gives the same scores as with weight matrix *W* where
>> edge weights for multiple
>> edges(considering direction) are summed
>>     W2 = matrix(c(0,0.7,0.6,0.9,0,0,0.2,0.9,0),byrow=T,nrow=3)
>>     D = diag(1/apply(W2,1,sum),nrow(W2))
>>     (pr.in = Beta%*%solve(diag(1,nrow(W2))-0.85*D%*%W2))
>>
>> 2d.) Same graph with edge weights-undirected.
>>     g2_UN = as.undirected(g2,'each') # do not collapse
>> Again, 'PRPACK' and 'ARPACK' give different scores. Moreover, 'PRPACK'
>> differs for
>>     (prUn1 = page_rank(g2,algo='PRPACK',directed = F)$vector)
>>     (prUN2 = page_rank(g2_UN,algo='PRPACK')$vector)
>> , while 'ARPACK' gives same scores for both. Or with weight matrix *W *I
>> can replicate 'ARPACK' by summing edge weights:
>>     W2_UN = matrix(c(0,1.6,0.8,1.6,0,0.9,0.8,0.9,0),nrow=3,byrow = T)
>>     D = diag(1/apply(W2_UN,1,sum),nrow(W2_UN))
>>     (pr.in = Beta%*%solve(diag(1,nrow(W2_UN))-0.85*D%*%W2_UN))
>>
>> I have examined .c files (centrality.c, arpack.c) and I know that
>> 'ARPACK' treats *directed=F* and *as.undirected() *same.
>>
>> So the main questions are, how does 'PRPACK' treat these two options and
>> moreover why 'PRPACK' gives different results than 'ARPACK'/weight matrix.
>>
>> Thank you very much.
>> Daniel Piacek
>> 
>>
>> 
>>
>> _______________________________________________
>> igraph-help mailing list
>> [email protected]
>> https://lists.nongnu.org/mailman/listinfo/igraph-help
>>
>>
>
> _______________________________________________
> igraph-help mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/igraph-help
>
>

############
## Bipartite case
############
P_names = as.character(1:5)
PartyID = data.frame(PartyID = P_names, fraud = c(1,1,0,0,0), stringsAsFactors = F)
P_mat = matrix(0,nrow(PartyID),nrow(PartyID)); rownames(P_mat) = P_names; colnames(P_mat) = P_names
A_names = paste0('A',1:13)
PA_mat = matrix(c(1,1,rep(0,11),0,0,1,1,1,rep(0,8),rep(0,5),1,rep(0,7),rep(0,6),1,1,rep(0,5),rep(0,8),1,1,0,0,0),byrow=T,nrow=5)
rownames(PA_mat) = P_names; colnames(PA_mat) = A_names
A_mat = matrix(c(sample(0:4,13,replace = T), sample(0:1,13,replace = T), sample(0:2,13,replace = T),
                 sample(0:3,13,replace = T), sample(0:2,9*13,replace = T)),byrow = T,nrow=13)
colnames(A_mat) = rownames(A_mat) =  A_names
diag(A_mat) = 0 # no loops
# Create disconnected component
A_mat[c(6,12),] = 0; A_mat[,c(6,12)] = 0
A_mat[6,12] = 4
## Create sink node
A_mat[11,] = 0
## Add some more 0s
A_mat[1:2,3:5] = 0; A_mat[3:5,1:2] = 0
A_mat[c(11,13),c(11,13)] = 0
#
A = rbind(cbind(P_mat,PA_mat),cbind(t(PA_mat),A_mat))
g = graph.adjacency(A)
## Set attributes
V(g)$pers = c(1,1,rep(0,vcount(g)-2))
## Edge weights
set.seed(1)
E(g)$weight = round(runif(ecount(g)),1)

### AS.UNDIRECTED and sum of edge weights
g_UN = as.undirected(g,'each') ## This keeps every edge and do not combine edge weights
g_UN2 = as.undirected(g) ## This somehow collapse edges, however edge weights are combined in strange way
E(g_UN)$weight
E(g_UN2)$weight ## Here edge weights doesn't make sense
g_simUN = simplify(g_UN, edge.attr.comb = list(weight='sum'))
g_simUN2 = simplify(g_UN2, edge.attr.comb = list(weight='sum'))
E(g_simUN)$weight ## This gives correct edge weights(sum of all edge weights between two vertices)
E(g_simUN2)$weight ## Here edge weights doesn't make sense

_______________________________________________
igraph-help mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/igraph-help

Re: [igraph] R help: Interpretation of PageRank scores for different setups(weighted/mutligraph)

Reply via email to