Hi Omer,

Your comment rang a bell -- I remembered that this issue has already popped
up back in 2014 when we first switched from ARPACK to PRPACK; see the
following issue in the issue tracker:

https://github.com/igraph/igraph/issues/671

Memories of mine about PRPACK were much fresher by then as I wrote the
following:

   - with the ARPACK implementation, the random walker teleports according
   to the distribution of the reset vector when it encounters a dangling node
   - with the PRPACK implementation, the random walker teleports according
   to a uniform distribution by default.

After reading the thread, it also became apparent that PRPACK is doing some
trickery with sink nodes; the following is from @dgleich, one of the
original authors of PRPACK:

"For the case that you are jumping to nodes that have no outgoing edges,
what happens is you add new edges according to the teleportation set/reset
set or the special set "u"."

So, all in all, the case is probably that your implementation matches the
ARPACK implementation of PRPACK from igraph, and that's why you are seeing
identical results (both your code and igraph's ARPACK implementation
formulates PageRank as an eigenvector problem and solves that). In case of
PRPACK, PageRank is not an eigenvector problem any more; again,
quoting @dgleich:

"PRPACK does decompose the graph into SCCs, but the primary advantage is
that it frames the PageRank problem as a linear system instead of the
eigenvalue problem. This has tremendous numerical advantages."

There was also a PR quite a while ago that attempted to introduce the
possibility of specifying the personalization vector and the reset vector
separately, but it did not get merged in the end:

https://github.com/igraph/igraph/pull/673

Another issue where this question popped up -- maybe it also provides more
insight:

https://github.com/igraph/igraph/issues/1211

Best,
T.


On Mon, 27 Jan 2020 at 23:17, Yalcin, Omer Faruk <o...@psu.edu> wrote:

> Thank you very much for tracking the code! Unfortunately, that doesn't
> work either. I am also fairly certain that allowing the node to stay at the
> same spot would give that node an unwarranted boost in pagerank, so it is
> probably undesirable.
>
> I do have an interesting result though; when I use "arpack" instead of the
> default "prpack", I get the exact same results as my custom written
> function. In other words, when there are nodes that have no outgoing edges,
> "prpack" and "arpack" do the computation differently.
>
> My problem seems to be solved (as long as there is no reason why "arpack"
> is wrong) but this difference between the two algorithms might be of
> interest to you.
>
> Thank you very much.
> Omer
> ------------------------------
> *From:* Tamas Nepusz <nta...@gmail.com>
> *Sent:* Monday, January 27, 2020 4:15 PM
> *To:* Yalcin, Omer Faruk <o...@psu.edu>
> *Cc:* Help for igraph users <igraph-help@nongnu.org>
> *Subject:* Re: [igraph] personalized pagerank computation issue
>
>
> That being said, after your question, I set the probability of navigating
> to other nodes from a node that has no outbound links to the
> personalization vector. That doesn't reproduce the igraph result either.
>
> There's also a third option: if there are no outbound nodes, stay at the
> same node with probability equal to 1-damping, _or_ navigate to a randomly
> picked node accoding to the persionalization vector with probability equal
> to damping. Sorry for not being too precise here; the thing is that igraph
> is using an external library (PRPACK) to calculate personalized PageRank
> scores, and I only managed to track the code to a point where I am
> convinced that we are passing down two  vectors to PRPACK; one is a uniform
> vector, and the other one is the personalization vector submitted by the
> user. Based on this, I would assume that PRPACK uses the personalization
> vector when the random walk is reset, and the uniform vector for a random
> teleport (after all, why would PRPACK need two vectors if it used the
> personalization vector for both cases?), but I did not manage to track it
> down further because PRPACK contains at least six different solvers,
> optimized for different use-cases, and I did not manage to figure out which
> one it would use in your particular case. But I'm pretty sure that the
> discrepancy between your results and ours is due to some corner case in the
> handling of sink nodes.
>
> T.
>
>
_______________________________________________
igraph-help mailing list
igraph-help@nongnu.org
https://lists.nongnu.org/mailman/listinfo/igraph-help

Reply via email to