Hi Tamas, So, I have tried several types of graphs and *'PRPACK', 'ARPACK'* and I also wrote simple power iteration in R. Here are some observations so far. 1.) For small networks *'PRPACK'* seems to have problem to deal with combination of edge weights and *directed=F / as.undirected(g, 'each')*. One would expect that PageRank for directed=F / as.undirected(g,'each') / as.undirected(g) should be the same, however only the last option(which sums edge weights) gives the same PageRank as *'ARPACK'* or power iteration(with weight matrix instead of adjacency, where weights are summed). 2.) Another example is again small network, *directed *and *weighted*, but with multiple edges(considering directions). Then '* PRPACK'* gives different scores then other methods. And moreover if it's at the same time personalized version, *'PRPACK'* scores doesn't add up to 1! Only after simplification *simplify(g, edge.attr.comb = 'sum')* it coincides with other two methods. 3.) If I create network with 'sink' nodes, *'ARPACK'* (directed) seems to have problem for personalized version of PageRank. 4.) There also seems to be problem with *as.undirected(g) *vs. *as.undirected(g,'each')*. Please see attached .R file with sample network(bipartite) and demonstration of this behavior. So maybe that's the reason why 'PRPACK' gives strange scores?

## Advertising

### Now when I move from these small sample networks to my huge weighted, directed network(millions of links) I got bit different results. 1.) If network contains loops, *'ARPACK'* seems to give wrong scores. 2.) Again surprisingly problem no.* 2.)* from previous part, is no longer present. So for now I have to stick with the power method. I'll find out something related to this problem, I'll let you know. With regards, Daniel 2016-11-30 9:38 GMT+01:00 Tamas Nepusz <nta...@rmki.kfki.hu>: > Hi, > > Sorry for the late response. Most likely there is a bug in how PRPACK > treats directed graphs; in theory, there should be no difference between > the ARPACK and PRPACK implementations (apart from the latter being faster > and more stable), both should treat directed=F and as.undirected() the same > way. I was briefly looking at the conversion between an igraph graph and > PRPACK's internal data structures in the source code, but I could not find > anything yet that could explain the difference; feel free to take a look > yourself as well: > > https://github.com/igraph/igraph/blob/master/src/prpack/ > prpack_igraph_graph.cpp > > I have added an issue to Github but I don't think I will be able to get > around to fixing this any time in the near future, so please chime in if > you managed to figure out anything in the meanwhile: > > https://github.com/igraph/igraph/issues/972 > > > T. > > On Sat, Nov 19, 2016 at 11:59 AM, danopiacek . <dano.pia...@gmail.com> > wrote: > >> Hello to everyone, >> >> First of all, I would like to thank the authors of these amazing package. >> I use it extensively in my master thesis. >> I am sorry for duplicate if there already exists same thread. >> >> I have a huge network with couple of millions of edges and would like to >> calculate personalized PageRank. My network is attributed and I would like >> to use edge weights to calculate PageRank. I can either consider my network >> as directed or undirected, but in both cases my network is multigraph. I >> have encountered several 'strange' results when calculating weighted >> PageRank. >> >> Here I will try to present it on simple networks without >> 'personalization'. >> >> 1a.) Example with directed symmetric network, with no multiple edges and >> no weights. >> A = matrix(c(0,1,1,1,0,0,1,0,0),byrow=T,nrow=3) >> g = graph.adjacency(A) >> >> Then I compute PageRank with page_rank function and get the same result >> for both "PRPACK" and "ARPACK" algo. I can also compute via matrix inverse >> as follows: >> D = diag(1/apply(A,1,sum),nrow(A)); Beta = >> (1-0.85)*rep(1/nrow(A),nrow(A)) >> (pr.in = Beta%*%solve(diag(1,nrow(A))-0.85*D%*%A)) >> And as expected I get the same PageRank score. In addition, if I consider >> network as undirected I get the same scores, as expected. >> >> 1b.) The same example with edge weights. >> >> set.seed(1) >> E(g)$weight = round(runif(ecount(g)),1) >> And again, using page_rank I get the same scores for both algorithms. Or >> I can get the same scores with weight matrix *W* as: >> W = matrix(c(0,0.3,0.4,0.6,0,0,0.9,0,0),byrow=T,nrow=3) >> D = diag(1/apply(W,1,sum),nrow(W)); Beta = >> (1-0.85)*rep(1/nrow(W),nrow(W)) >> (pr.in = Beta%*%solve(diag(1,nrow(W))-0.85*D%*%W)) >> However, if I consider this network being undirected, I get different >> scores, namely: >> g_UN = as.undirected(g,'each') # do not collapse >> (prUn1 = page_rank(g,algo='PRPACK', directed = F)$vector) >> (prUn2 = page_rank(g,algo='ARPACK', directed = F)$vector) >> (prUn3 = page_rank(g_UN, algo='PRPACK')$vector) >> (prUn4 = page_rank(g_UN, algo='ARPACK')$vector) >> Here scores given by 'ARPACK' are different to those given by 'PRPACK'. >> Moreover, with 'ARPACK' I get same scores for both *g_UN*(undirected >> network) and for *directed=F*. However 'PRPACK' gives different scores. >> I can replicate results for 'ARPACK' with weight matrix, where I take >> sum/average of edge weights: >> W_UN = matrix(c(0,0.9,1.3,0.9,0,0,1.3,0,0),byrow=T,nrow=3) >> D = diag(1/apply(W_UN,1,sum),nrow(W_UN)) >> (pr.in = Beta%*%solve(diag(1,nrow(W_UN))-0.85*D%*%W_UN)) >> >> 2a.) Example with directed multigraph. >> >> A2 = matrix(c(0,2,1,1,0,0,1,1,0),byrow=T,nrow=3) >> g2 = graph.adjacency(A2) >> Without considering edge weights both algorithms give the same scores, >> which can be also found as >> D = diag(1/apply(A2,1,sum),nrow(A2)) >> (pr.in = Beta%*%solve(diag(1,nrow(A2))-0.85*D%*%A2)) >> >> 2b.) Same graph as undirected. >> Here again both algorithms give the same score for both *directed=F* and >> *as.undirected(g2,'each')*, or: >> A2_UN = matrix(c(0,3,2,3,0,1,2,1,0),byrow=T,nrow=3) >> D = diag(1/apply(A2_UN,1,sum),nrow(A2_UN)) >> (pr.in = Beta%*%solve(diag(1,nrow(A2_UN))-0.85*D%*%A2_UN)) >> gives the same scores. >> >> 2c.) Same graph with edge weights-directed. >> set.seed(1) >> E(g2)$weight = round(runif(ecount(g2)),1) >> And again I get different scores >> page_rank(g2,algo='PRPACK')$vector >> page_rank(g2,algo='ARPACK')$vector >> Where 'ARPACK' gives the same scores as with weight matrix *W* where >> edge weights for multiple >> edges(considering direction) are summed >> W2 = matrix(c(0,0.7,0.6,0.9,0,0,0.2,0.9,0),byrow=T,nrow=3) >> D = diag(1/apply(W2,1,sum),nrow(W2)) >> (pr.in = Beta%*%solve(diag(1,nrow(W2))-0.85*D%*%W2)) >> >> 2d.) Same graph with edge weights-undirected. >> g2_UN = as.undirected(g2,'each') # do not collapse >> Again, 'PRPACK' and 'ARPACK' give different scores. Moreover, 'PRPACK' >> differs for >> (prUn1 = page_rank(g2,algo='PRPACK',directed = F)$vector) >> (prUN2 = page_rank(g2_UN,algo='PRPACK')$vector) >> , while 'ARPACK' gives same scores for both. Or with weight matrix *W *I >> can replicate 'ARPACK' by summing edge weights: >> W2_UN = matrix(c(0,1.6,0.8,1.6,0,0.9,0.8,0.9,0),nrow=3,byrow = T) >> D = diag(1/apply(W2_UN,1,sum),nrow(W2_UN)) >> (pr.in = Beta%*%solve(diag(1,nrow(W2_UN))-0.85*D%*%W2_UN)) >> >> I have examined .c files (centrality.c, arpack.c) and I know that >> 'ARPACK' treats *directed=F* and *as.undirected() *same. >> >> So the main questions are, how does 'PRPACK' treat these two options and >> moreover why 'PRPACK' gives different results than 'ARPACK'/weight matrix. >> >> Thank you very much. >> Daniel Piacek >> >> >> >> >> _______________________________________________ >> igraph-help mailing list >> igraph-help@nongnu.org >> https://lists.nongnu.org/mailman/listinfo/igraph-help >> >> > > _______________________________________________ > igraph-help mailing list > igraph-help@nongnu.org > https://lists.nongnu.org/mailman/listinfo/igraph-help > >

############ ## Bipartite case ############ P_names = as.character(1:5) PartyID = data.frame(PartyID = P_names, fraud = c(1,1,0,0,0), stringsAsFactors = F) P_mat = matrix(0,nrow(PartyID),nrow(PartyID)); rownames(P_mat) = P_names; colnames(P_mat) = P_names A_names = paste0('A',1:13) PA_mat = matrix(c(1,1,rep(0,11),0,0,1,1,1,rep(0,8),rep(0,5),1,rep(0,7),rep(0,6),1,1,rep(0,5),rep(0,8),1,1,0,0,0),byrow=T,nrow=5) rownames(PA_mat) = P_names; colnames(PA_mat) = A_names A_mat = matrix(c(sample(0:4,13,replace = T), sample(0:1,13,replace = T), sample(0:2,13,replace = T), sample(0:3,13,replace = T), sample(0:2,9*13,replace = T)),byrow = T,nrow=13) colnames(A_mat) = rownames(A_mat) = A_names diag(A_mat) = 0 # no loops # Create disconnected component A_mat[c(6,12),] = 0; A_mat[,c(6,12)] = 0 A_mat[6,12] = 4 ## Create sink node A_mat[11,] = 0 ## Add some more 0s A_mat[1:2,3:5] = 0; A_mat[3:5,1:2] = 0 A_mat[c(11,13),c(11,13)] = 0 # A = rbind(cbind(P_mat,PA_mat),cbind(t(PA_mat),A_mat)) g = graph.adjacency(A) ## Set attributes V(g)$pers = c(1,1,rep(0,vcount(g)-2)) ## Edge weights set.seed(1) E(g)$weight = round(runif(ecount(g)),1) ### AS.UNDIRECTED and sum of edge weights g_UN = as.undirected(g,'each') ## This keeps every edge and do not combine edge weights g_UN2 = as.undirected(g) ## This somehow collapse edges, however edge weights are combined in strange way E(g_UN)$weight E(g_UN2)$weight ## Here edge weights doesn't make sense g_simUN = simplify(g_UN, edge.attr.comb = list(weight='sum')) g_simUN2 = simplify(g_UN2, edge.attr.comb = list(weight='sum')) E(g_simUN)$weight ## This gives correct edge weights(sum of all edge weights between two vertices) E(g_simUN2)$weight ## Here edge weights doesn't make sense

_______________________________________________ igraph-help mailing list igraph-help@nongnu.org https://lists.nongnu.org/mailman/listinfo/igraph-help