Re: [graph-tool] Latent Poisson questions

Deklan Webster Wed, 15 Apr 2020 17:57:26 -0700

Err, I meant, of course:

```
Directed : ~13/1000 correctly identified
Undirected: 488/1000 correctly identified
```


On Wed, Apr 15, 2020 at 8:54 PM Deklan Webster <[email protected]> wrote:

> I, totally by accident, think I discovered the issue! In the process of
> setting up the testing between the reconstructive method, get_edges_prob
> method, and the simple stacked topological similarity classifier I
> accidentally set the graph to undirected without setting it back to
> directed. The results appeared to be reasonable finally. I've gone back and
> tested on the graph I've been testing on, and indeed:
>
> Undirected: ~13/1000 correctly identified
> Directed: 488/1000 correctly identified
>
> A giant improvement! Of course, I'm relieved since this solves my problem
> (I can just, as least, set to undirected to compute this). But, now I'm
> wondering what's going on. Is there some predictable reason why the
> directed case is performing so much worse? Something to do with directed
> graphs and equlibration? But if it were to do with equlibration why did
> this issue never affect `get_edges_prob`'s performance? Could there be a
> bug in how graph-tool is handling the directed case? In the paper it seems
> you've worked with the directed case before, so I assume you would have
> noticed if there were.
>
> To be sure, I tested this on the "polblogs" graph as well. Undirected:
> reconstructive performance is great, directed: terrible.
>
> > You didn't even have to read any paper, just the docstring would have
> been enough.
>
> > Beta is the inverse temperature parameter, and setting it to infinity
> means turning the MCMC into a greedy optimization heuristic.
>
> > And I don't "recommend it". It is not applicable to your context.
>
> I've read the documentation. The docstring says "Inverse temperature". The
> relevant section of the docs say,
>
> > An alternative is to use a greedy algorithm based on a merge-split MCMC
> with zero temperature [peixoto-merge-split-2020]
> <https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#peixoto-merge-split-2020>,
> which requires a single global state, and thus can reduce memory usage.
>
> Just from this I don't understand. Not setting beta to infinity has
> already reduced memory usage for me. I don't understand the significance of
> this being greedy or why it doesn't apply to my situation. If the
> explanation of all this is in one of the papers please point to it and I'll
> read it.
>
> > To be honest, I think the pattern of saying "I plan to read your
> documentation/paper at some point, but you could you please just explain
> this to me before I do so" a bit disrespectful. Why is my time worth
> less than yours?
>
> I'm sorry it seems that way to you. I have read all of the documentation
> at least once, and some of parts of the papers. I don't think my time is
> more valuable. I would happily volunteer to contribute to the docs or to
> the project in general in any way I can. I've found some more typos in the
> docs in addition to the one I already told you about. I can at least
> contribute these. Also, I'll email you those paper typos after this.
>
> > This kind of comparison has been done already in
> https://arxiv.org/abs/1802.10582 and https://arxiv.org/abs/1909.07578.
> The SBM approach is the single best classifier among the over a hundred
> they consider, which is marginally beat only by a stacking of around 40
> other predictors.
>
> I've read both these papers and referred to the latter twice in this
> thread. We've already discussed that these don't use either
> `get_edges_prob` nor the full reconstructive marginal edge probability
>
> On page 8 of the former, they're using this as their similarity for SBM:
> "s_ij = θ_i θ_j * l_gi,gj"
>
> As I've already told you, the "score" I'm getting with `get_edges_prob` is
> also the *most impactful feature of all the features I've tested*
> (including many newer network embedding methods not considered in
> aforementioned papers). This is impressive. Those papers didn't test this
> as far as I'm aware. The reconstructive approach, however, was giving me
> worse results. This discrepancy is what I was trying to account for (and,
> did, now I think. see above).
>
> > In any case that was not the point of the PRX paper, which was to
> develop an actual Bayesian reconstruction algorithm, not a binary
> classifier. AFAIK there is no other algorithm that does this, so there
> was nothing to compare to. If you're worried about comparing with binary
> classifiers, you can just convert this approach into one by using the
> marginal posterior probabilities as "scores" and go from there, as the
> papers above do. Then you are comparing apples to apples.
>
> Yes, this is my intention as I've stated.
>
> Thanks for your help, as always
>
> On Wed, Apr 15, 2020 at 2:04 AM Tiago de Paula Peixoto <[email protected]>
> wrote:
>
>> Am 15.04.20 um 05:40 schrieb Deklan Webster:
>> >> Most typically, edge prediction is
>> > formulated as a binary classification task [7], in which each
>> > missing (or spurious) edge is attributed a “score” (which
>> > may or may not be a probability), so that those that reach a
>> > prespecified discrimination threshold are classified as true
>> > edges (or true nonedges)
>> >
>> > I think this is misleading. This is not typical.  As far as I can tell,
>> > very few people use unsupervised binary classification for link
>> > prediction. Most typically, edge prediction is formulated as a
>> > *supervised* binary classification task. From that setup, you can
>> > calibrate probabilities based on ability to predict.
>>
>> It's not misleading at all, this is exactly how a binary classifier
>> works, supervised or otherwise. How you find the threshold is besides
>> the point.
>>
>> >> Indeed selecting the optimal threshold can be done by cross validation
>> > in a *supervised* setting, but even then this will depend in general on
>> > the fraction of removed edges, size of test set, etc.
>> >
>> > I agree, this is a valid concern. If this is your objection to the
>> > *supervised* binary classification formulation then I think this should
>> > be the statement in the paper.
>>
>> This is just another difference. And I wasn't "objecting", just
>> explaining what you had misunderstood.
>>
>> > Well, I have run it multiple times with different numbers. To be sure, I
>> > just now ran it with the epsilon removed, 2000 wait, multiflip on, and
>> > then 100k(!) sampling iterations. Results were pretty much the same.
>>
>> It's a shame.
>>
>> > I noticed that in the docs you now recommend setting beta=np.inf when
>> > using multiflip. What exactly is this doing? (I plan to soon read that
>> > other paper you mention which probably includes this..)
>>
>> You didn't even have to read any paper, just the docstring would have
>> been enough.
>>
>> Beta is the inverse temperature parameter, and setting it to infinity
>> means turning the MCMC into a greedy optimization heuristic.
>>
>> And I don't "recommend it". It is not applicable to your context.
>>
>> To be honest, I think the pattern of saying "I plan to read your
>> documentation/paper at some point, but you could you please just explain
>> this to me before I do so" a bit disrespectful. Why is my time worth
>> less than yours?
>>
>> > I noticed that in your paper you didn't compare your reconstruction
>> > method performance against any baseline. How do you know how well it's
>> > performing if you don't have a baseline? I'm currently pessimistic given
>> > its performance on the graph I'm testing. Some of those graphs you were
>> > testing on in the paper are loaded into graph-tool, right? It would be
>> > fairly easily to train a RandomForest (no fancy boosted trees necessary)
>> > with the stacked similarity measures from graph-tool (and maybe a few
>> > other simple features I have in mind...) and test the performance
>> > against your reconstruction approach (at least for just link
>> > prediction). Interested in this? Conjectures? I would be willing to do
>> > it for some of the moderately-sized graphs.
>> This kind of comparison has been done already in
>> https://arxiv.org/abs/1802.10582 and https://arxiv.org/abs/1909.07578.
>> The SBM approach is the single best classifier among the over a hundred
>> they consider, which is marginally beat only by a stacking of around 40
>> other predictors.
>>
>> In any case that was not the point of the PRX paper, which was to
>> develop an actual Bayesian reconstruction algorithm, not a binary
>> classifier. AFAIK there is no other algorithm that does this, so there
>> was nothing to compare to. If you're worried about comparing with binary
>> classifiers, you can just convert this approach into one by using the
>> marginal posterior probabilities as "scores" and go from there, as the
>> papers above do. Then you are comparing apples to apples.
>>
>> If you have further questions about how to use the library, please go
>> ahead and ask. But if you want to discuss how to compare supervised vs
>> unsupervised edge prediction, etc, please take this off this list since
>> it's off-topic.
>>
>> Best,
>> Tiago
>>
>> --
>> Tiago de Paula Peixoto <[email protected]>
>>
>> _______________________________________________
>> graph-tool mailing list
>> [email protected]
>> https://lists.skewed.de/mailman/listinfo/graph-tool
>>
>

_______________________________________________
graph-tool mailing list
[email protected]
https://lists.skewed.de/mailman/listinfo/graph-tool

Re: [graph-tool] Latent Poisson questions

Reply via email to