Re: [graph-tool] Latent Poisson questions

Tiago de Paula Peixoto Tue, 14 Apr 2020 23:05:10 -0700

Am 15.04.20 um 05:40 schrieb Deklan Webster:
>> Most typically, edge prediction is
> formulated as a binary classification task [7], in which each
> missing (or spurious) edge is attributed a “score” (which
> may or may not be a probability), so that those that reach a
> prespecified discrimination threshold are classified as true
> edges (or true nonedges)
> 
> I think this is misleading. This is not typical.  As far as I can tell,
> very few people use unsupervised binary classification for link
> prediction. Most typically, edge prediction is formulated as a
> *supervised* binary classification task. From that setup, you can
> calibrate probabilities based on ability to predict.


It's not misleading at all, this is exactly how a binary classifier
works, supervised or otherwise. How you find the threshold is besides
the point.

>> Indeed selecting the optimal threshold can be done by cross validation
> in a *supervised* setting, but even then this will depend in general on
> the fraction of removed edges, size of test set, etc.
> 
> I agree, this is a valid concern. If this is your objection to the
> *supervised* binary classification formulation then I think this should
> be the statement in the paper.

This is just another difference. And I wasn't "objecting", just
explaining what you had misunderstood.

> Well, I have run it multiple times with different numbers. To be sure, I
> just now ran it with the epsilon removed, 2000 wait, multiflip on, and
> then 100k(!) sampling iterations. Results were pretty much the same.

It's a shame.

> I noticed that in the docs you now recommend setting beta=np.inf when
> using multiflip. What exactly is this doing? (I plan to soon read that
> other paper you mention which probably includes this..) 

You didn't even have to read any paper, just the docstring would have
been enough.

Beta is the inverse temperature parameter, and setting it to infinity
means turning the MCMC into a greedy optimization heuristic.

And I don't "recommend it". It is not applicable to your context.

To be honest, I think the pattern of saying "I plan to read your
documentation/paper at some point, but you could you please just explain
this to me before I do so" a bit disrespectful. Why is my time worth
less than yours?

> I noticed that in your paper you didn't compare your reconstruction
> method performance against any baseline. How do you know how well it's
> performing if you don't have a baseline? I'm currently pessimistic given
> its performance on the graph I'm testing. Some of those graphs you were
> testing on in the paper are loaded into graph-tool, right? It would be
> fairly easily to train a RandomForest (no fancy boosted trees necessary)
> with the stacked similarity measures from graph-tool (and maybe a few
> other simple features I have in mind...) and test the performance
> against your reconstruction approach (at least for just link
> prediction). Interested in this? Conjectures? I would be willing to do
> it for some of the moderately-sized graphs.
This kind of comparison has been done already in
https://arxiv.org/abs/1802.10582 and https://arxiv.org/abs/1909.07578.
The SBM approach is the single best classifier among the over a hundred
they consider, which is marginally beat only by a stacking of around 40
other predictors.

In any case that was not the point of the PRX paper, which was to
develop an actual Bayesian reconstruction algorithm, not a binary
classifier. AFAIK there is no other algorithm that does this, so there
was nothing to compare to. If you're worried about comparing with binary
classifiers, you can just convert this approach into one by using the
marginal posterior probabilities as "scores" and go from there, as the
papers above do. Then you are comparing apples to apples.

If you have further questions about how to use the library, please go
ahead and ask. But if you want to discuss how to compare supervised vs
unsupervised edge prediction, etc, please take this off this list since
it's off-topic.

Best,
Tiago

-- 
Tiago de Paula Peixoto <[email protected]>

signature.asc
Description: OpenPGP digital signature

_______________________________________________
graph-tool mailing list
[email protected]
https://lists.skewed.de/mailman/listinfo/graph-tool

Re: [graph-tool] Latent Poisson questions

Reply via email to