Re: [igraph] igraph R: fit_power_law

Tamas Nepusz Mon, 05 Aug 2019 07:33:02 -0700

Dear Sander
.

>
>    1. The igraph documentation suggests that the bfgs function is used to
>    estimate the power law alpha, but I think the C implementation relies on
>    the  Broyden-Fletcher-Goldfarb-Shanno optimization function of the
>    lbfgs library instead. Is that correct?
>
> This is the exact implementation of the BFGS optimization that we use in
power law fitting:


https://github.com/ntamas/plfit/blob/master/src/lbfgs.c

As far as I know this is the C port of the limited memory variant of the
Broyden-Fletcher-Goldfarb-Shanno method, originally written in FORTRAN. The
license notes in the source code might give you more clues.


>    1. The fit_power_law function relies on the MLE function of the stat4
>    package. I am curious why this was deprecated, given the availability of
>    plfit and MLE parameters. Is this simply a memory issue?
>
> I don't know; this is purely in the domain of the R interface of igraph;
the C core uses the L-BFGS method and my "plfit" library:

https://github.com/ntamas/plfit

The plfit library is an efficient implementation of the method published by
Clauset, Shalizi and Newman:

Clauset A, Shalizi CR and Newman MEJ: Power-law distributions in empirical
data. SIAM Review 51, 661-703 (2009).


>
>    1. How to interpret the p-value of the Kolmogorov-Smirnov test?
>
> See the paper cited above for more details.


>
>    1. The igraph help file states: "Small p-values (less than 0.05)
>    indicate that the test rejected the hypothesis that the original data could
>    have been drawn from the fitted power-law distribution" . The C
>    implementation of the KS test in igraph uses the Hurwitz Zeta function.
>    Shouldn't this mean that *high *p-values indicate a good model fit, as
>    suggested by Clauset et al (2009:678)?
>
> Well, tests based on p-values are not really about whether a model is a
"good fit" or a "bad fit"; a low p-value _roughly_ says that "it is very
unlikely that the data could have been generated from the hypothesized
distribution" (in our case, a power-law). A high p-value _roughly_ means
that "the data may have come from the hypothesized distribution"; however,
there could be alternative distributions that can describe the data just as
well.

So, in a nutshell:

low p-value --> null hypothesis (power-law) rejected --> data is likely not
a power-law
high p-value --> null hypothese (power-law) _not_ rejected --> data could
come from a power-law, or maybe from something else, we don't know, we just
could not _exclude_ the power-law

All the best,
T.

_______________________________________________
igraph-help mailing list
igraph-help@nongnu.org
https://lists.nongnu.org/mailman/listinfo/igraph-help

Re: [igraph] igraph R: fit_power_law

Reply via email to