Re: [graph-tool] Latent Poisson questions

Tiago de Paula Peixoto Sat, 11 Apr 2020 04:02:26 -0700

Am 11.04.20 um 07:24 schrieb Deklan Webster:
>> It's a sum in log-space, so it's incremental.
> 
> Can you show me in code how to use logsumexp in such a way such that I
> don't need to store 6,000,000 * k floats in memory where k is the number
> of sampling iterations?


It is so utterly simple, it is *exactly* like doing sums. Here is the
log sum of the values from 1 to 1000, without keeping anything in memory:

   res = -numpy.inf  # zero
   for x in range(1, 1000):
       res = logsumexp([res, log(x)])


>> Most other link prediction methods only provide a ranking or scores of
> the edges. In order to actually predict edges, all these methods require
> you to determine how many edges should be actually placed.
> 
> In your setup you still end up with a probability in the end. What
> probability cutoff you consider to be a "real missing edge" or not is
> still arbitrary, correct? 

Not quite, at least not up to the choice of a loss function. For binary
classification using the 0-1 loss (there are not many alternatives) the
optimal threshold is 1/2.

(I discuss this in the PRX paper, which I get the impression you did not
yet bother to read.)

> Using a binary classifier and whatever link
> prediction features you also can get probabilities in the end.

Not without making an _arbitrary_ choice of normalization, which results
in an _arbitrary_ number of edges being predicted.

> For
> instance, and relating to the point of choosing that f ratio you
> mention, I counted some probabilities to get an estimate on the number
> of "actually missing" edges. I've confirmed that these probabilities are
> fairly well calibrated by removing edges and predicting, etc.
> 
> 39 with probability above 0.80
> 143 with probability above 0.70
> 318 with probability above 0.60

Without saying how you normalized the scores to begin with, these values
are meaningless.

>> You should also try to consider using a different framework, where you
> leave the missing edges "unobserved", instead of transforming them into
> nonedges. The distinction is important, and changes the reconstruction
> problem. You can achieve this with MeasuredBlockState by setting the
> corresponding n value to zero.
> 
> Do you mean setting n_default=0? Just tried that and it appears like
> almost all of my existing edges are missing from the marginal graph. Not
> sure what's going on there. I tried setting mu and nu to extreme values
> control this(?) but it doesn't appear to help much. With alpha=1,
> beta=1, mu=1, nu=9999: 20733 out of 20911 of my original edges are
> missing. What's happening here?

That is not what I meant. Setting n_default=0 means saying that every
non-edge is non-observed. You should set n=0 only to the edges
'removed', and to a comparable number of actual non_edges.

> Okay, so to summarize what's happening here: if alpha and beta are both
> 1 then p is uniform from [0,1], and it appears the model is favoring
> very low p for my graph. By setting alpha and beta I can control the
> distribution of p. By maintaining this alpha/(alpha+beta) ratio while
> increasing alpha and beta I can decrease the variance, which forces the
> p to be even higher (for the same mean), because it wants to go to the
> lower tail. That correct?

Right.

> Is there any reason not to set something very aggressive like alpha=500,
> beta=500? I just tested this is as a feature in my binary classifier
> with 5000 sampling iterations and it performed much worse than the
> conditional prob setup (with only 100 sampling iterations). I expected
> this marginal prob setup to work comparably to the conditional prob
> setup. Thoughts?

Choosing alpha=beta=infinity means being 100% sure that the value of r
(probability of missing edges) is 1/2. Does that value make sense to
you? Did you remove exactly 1/2 of the edges?

Given the true value of r, the optimal choice should be beta=[(1-r)/r]*
alpha, and making alpha -> infinity (in practice just a large-enough value).

Best,
Tiago

-- 
Tiago de Paula Peixoto <[email protected]>

signature.asc
Description: OpenPGP digital signature

_______________________________________________
graph-tool mailing list
[email protected]
https://lists.skewed.de/mailman/listinfo/graph-tool

Re: [graph-tool] Latent Poisson questions

Reply via email to