Am 11.04.20 um 07:24 schrieb Deklan Webster: >> It's a sum in log-space, so it's incremental. > > Can you show me in code how to use logsumexp in such a way such that I > don't need to store 6,000,000 * k floats in memory where k is the number > of sampling iterations?
It is so utterly simple, it is *exactly* like doing sums. Here is the
log sum of the values from 1 to 1000, without keeping anything in memory:
res = -numpy.inf # zero
for x in range(1, 1000):
res = logsumexp([res, log(x)])
>> Most other link prediction methods only provide a ranking or scores of
> the edges. In order to actually predict edges, all these methods require
> you to determine how many edges should be actually placed.
>
> In your setup you still end up with a probability in the end. What
> probability cutoff you consider to be a "real missing edge" or not is
> still arbitrary, correct?
Not quite, at least not up to the choice of a loss function. For binary
classification using the 0-1 loss (there are not many alternatives) the
optimal threshold is 1/2.
(I discuss this in the PRX paper, which I get the impression you did not
yet bother to read.)
> Using a binary classifier and whatever link
> prediction features you also can get probabilities in the end.
Not without making an _arbitrary_ choice of normalization, which results
in an _arbitrary_ number of edges being predicted.
> For
> instance, and relating to the point of choosing that f ratio you
> mention, I counted some probabilities to get an estimate on the number
> of "actually missing" edges. I've confirmed that these probabilities are
> fairly well calibrated by removing edges and predicting, etc.
>
> 39 with probability above 0.80
> 143 with probability above 0.70
> 318 with probability above 0.60
Without saying how you normalized the scores to begin with, these values
are meaningless.
>> You should also try to consider using a different framework, where you
> leave the missing edges "unobserved", instead of transforming them into
> nonedges. The distinction is important, and changes the reconstruction
> problem. You can achieve this with MeasuredBlockState by setting the
> corresponding n value to zero.
>
> Do you mean setting n_default=0? Just tried that and it appears like
> almost all of my existing edges are missing from the marginal graph. Not
> sure what's going on there. I tried setting mu and nu to extreme values
> control this(?) but it doesn't appear to help much. With alpha=1,
> beta=1, mu=1, nu=9999: 20733 out of 20911 of my original edges are
> missing. What's happening here?
That is not what I meant. Setting n_default=0 means saying that every
non-edge is non-observed. You should set n=0 only to the edges
'removed', and to a comparable number of actual non_edges.
> Okay, so to summarize what's happening here: if alpha and beta are both
> 1 then p is uniform from [0,1], and it appears the model is favoring
> very low p for my graph. By setting alpha and beta I can control the
> distribution of p. By maintaining this alpha/(alpha+beta) ratio while
> increasing alpha and beta I can decrease the variance, which forces the
> p to be even higher (for the same mean), because it wants to go to the
> lower tail. That correct?
Right.
> Is there any reason not to set something very aggressive like alpha=500,
> beta=500? I just tested this is as a feature in my binary classifier
> with 5000 sampling iterations and it performed much worse than the
> conditional prob setup (with only 100 sampling iterations). I expected
> this marginal prob setup to work comparably to the conditional prob
> setup. Thoughts?
Choosing alpha=beta=infinity means being 100% sure that the value of r
(probability of missing edges) is 1/2. Does that value make sense to
you? Did you remove exactly 1/2 of the edges?
Given the true value of r, the optimal choice should be beta=[(1-r)/r]*
alpha, and making alpha -> infinity (in practice just a large-enough value).
Best,
Tiago
--
Tiago de Paula Peixoto <[email protected]>
signature.asc
Description: OpenPGP digital signature
_______________________________________________ graph-tool mailing list [email protected] https://lists.skewed.de/mailman/listinfo/graph-tool
