> Let's pause and appreciate the learning opportunity.

> After considerable amount of unsubstantiated speculation about bugs in
the code, and even the underlying soundness of the algorithm, the
problem turned out to be a basic misuse. One which was impossible to
guess from the information given.

> It's also a learning on my part, in fact about something that I thought
I already knew: It's a pointless exercise to try to troubleshoot
something without a _minimal_ and _complete_ example at hand.

I think you may have misunderstood me. Sorry if I wasn't communicating
clearly.  I made a typo in my second to last email swapping the numbers for
"directed" and "undirected". I sent another email immediately after
correcting the typo. Hope you saw that. I wasn't saying I'd been making a
silly mistake this whole time (as far as I can tell...). I was saying that
I accidentally set my (properly) directed graph to undirected and the
performance skyrocketed.

Unfortunately, that performance improvement was just the result of a
mistake on my part. I didn't realize graph-tool's behavior for setting a
graph to undirected was to turn reciprocal edges into 2 multiedges. So,
when I removed the sampled edges I was only removing one of the two
multiedges (in the fraction of the randomly selected which had a
reciprocal). I.e., it was basically pure data leakage. Once I corrected
that error by only sampling edges which don't have a reciprocal the
performance between undirected/directed mostly disappeared. So, performance
still bad. To ensure it wasn't just my network or something to do with
directedness I tried several simple undirected graphs from graph-tool's
collection (where I don't need to worry about multiedges). All performed
poorly.

Since I haven't seen this network reconstruction perform well on any of the
graphs I've tested I've given up on this approach. `get_edges_prob` is
working very well for me, so I'll just stick to that, despite its slowness.

Although, this possibility occurred to me: I wonder how the reconstruction
would do if I fed back the probabilities that I get from my binary
classifier using the method described in the "Incorporating Extrinsic
Uncertainty Estimates" section of your paper. I could regard it as a final
step in a link prediction flow. Seems plausible to me that with this extra
information the reconstruction might work better. I'm assuming no one has
tried this yet?

> The directed SBM does not generate reciprocal edges very well, hence it
is not a good
predictor in this case.

Where can I read more about this? I assumed the SBM handles the directed
case just fine, even if there are many reciprocal edges. Does this have
implications for community detection as well?

> That is in fact very similar to the conditional probability computed by
get_edges_prob(). The latter is more accurate, yields an actual
normalized probability, and includes information from the priors, but
for sufficiently sparse and large graphs, both computations should
coincide.

Hmm, I didn't realize this. Thanks.

Thanks for your help, as always

On Thu, Apr 16, 2020 at 2:47 AM Tiago de Paula Peixoto <[email protected]>
wrote:

> Am 16.04.20 um 02:54 schrieb Deklan Webster:
> > I, totally by accident, think I discovered the issue! In the process of
> > setting up the testing between the reconstructive method, get_edges_prob
> > method, and the simple stacked topological similarity classifier I
> > accidentally set the graph to undirected without setting it back to
> > directed. The results appeared to be reasonable finally. I've gone back
> > and tested on the graph I've been testing on, and indeed:
> >
> > Undirected: ~13/1000 correctly identified
> > Directed: 488/1000 correctly identified
> >
> > A giant improvement! Of course, I'm relieved since this solves my
> > problem (I can just, as least, set to undirected to compute this).
>
> Let's pause and appreciate the learning opportunity.
>
> After considerable amount of unsubstantiated speculation about bugs in
> the code, and even the underlying soundness of the algorithm, the
> problem turned out to be a basic misuse. One which was impossible to
> guess from the information given.
>
> It's also a learning on my part, in fact about something that I thought
> I already knew: It's a pointless exercise to try to troubleshoot
> something without a _minimal_ and _complete_ example at hand.
>
> > But,
> > now I'm wondering what's going on. Is there some predictable reason why
> > the directed case is performing so much worse? Something to do with
> > directed graphs and equlibration? But if it were to do with equlibration
> > why did this issue never affect `get_edges_prob`'s performance? Could
> > there be a bug in how graph-tool is handling the directed case? In the
> > paper it seems you've worked with the directed case before, so I assume
> > you would have noticed if there were.
>
> I'm done with the wild guesses, this is not productive.
>
> > To be sure, I tested this on the "polblogs" graph as well. Undirected:
> > reconstructive performance is great, directed: terrible.
>
> The polblogs network I actually have investigated a bit in the past.
> Despite being directed, most edges of that network are actually
> reciprocal, so in fact it is mostly an undirected network. The directed
> SBM does not generate reciprocal edges very well, hence it is not a good
> predictor in this case. The undirected model is a better fit.
>
> > I've read the documentation. The docstring says "Inverse temperature".
> > The relevant section of the docs say,
> >
> >> An alternative is to use a greedy algorithm based on a merge-split
> > MCMC with zero temperature [peixoto-merge-split-2020]
> > <
> https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#peixoto-merge-split-2020
> >,
> > which requires a single global state, and thus can reduce memory usage.
> >
> > Just from this I don't understand. Not setting beta to infinity has
> > already reduced memory usage for me. I don't understand the significance
> > of this being greedy or why it doesn't apply to my situation.If the
> > explanation of all this is in one of the papers please point to it and
> > I'll read it.
>
> The documentation assumes a basic understanding of how MCMC works,
> without which a proper use of the the methods described is not possible,
> unfortunately.
>
> The MCMC algorithm (for reconstruction) attempts to sample from a
> distribution \propto P(A,b)^beta where A is the network, b is a
> partition and beta is the inverse temperate parameter. Setting beta=1
> means sampling from P(A,b), whereas beta=infinity means finding a single
> (A,b) that maximizes the distribution. The choice beta=infinity also
> breaks ergodicity and detailed balance, making it a greedy heuristic,
> rather than a proper MCMC.
>
> To compute marginal probabilities you need beta=1, i.e. sample from the
> joint distribution, not maximize it.
>
> The point about reducing memory applied only when comparing to
> minimize_blockmodel_dl().
>
> >> This kind of comparison has been done already in
> > https://arxiv.org/abs/1802.10582 and https://arxiv.org/abs/1909.07578.
> > The SBM approach is the single best classifier among the over a hundred
> > they consider, which is marginally beat only by a stacking of around 40
> > other predictors.
> >
> > I've read both these papers and referred to the latter twice in this
> > thread. We've already discussed that these don't use either
> > `get_edges_prob` nor the full reconstructive marginal edge probability
> >
> > On page 8 of the former, they're using this as their similarity for SBM:
> > "s_ij = θ_i θ_j * l_gi,gj"
>
> That is in fact very similar to the conditional probability computed by
> get_edges_prob(). The latter is more accurate, yields an actual
> normalized probability, and includes information from the priors, but
> for sufficiently sparse and large graphs, both computations should
> coincide.
>
> Best,
> Tiago
>
> --
> Tiago de Paula Peixoto <[email protected]>
>
> _______________________________________________
> graph-tool mailing list
> [email protected]
> https://lists.skewed.de/mailman/listinfo/graph-tool
>
_______________________________________________
graph-tool mailing list
[email protected]
https://lists.skewed.de/mailman/listinfo/graph-tool

Reply via email to