Dear all,
I always try to refrain myself from getting into these discussions, but I
cannot resist more the temptation. Here are some more ideas that I hope
bring more light than confusion:
- There must be some functional relationship between the FSC and the SNR,
but the exact analytical form of this relationship is unknown (I suspect
that it must be at least monotonic, the worse the SNR, the worse FSC; but
even this is difficult to prove). The relationship we normally use
FSC=SNR/(1+SNR) was derived in a context that does not apply to CryoEM (1D
stationary signals in real space; our molecules are not stationary), and
consequently any reasoning of any threshold based on this relationship is
incorrect (see our review).
- Still, as long as we all use the same threshold, the reported
resolutions are comparable to each other. In that regard, I am happy that
we have set 0.143 (although any other number would have served the purpose)
as the standard.
- I totally agree with Steve that the full FSC is much more informative
than its crossing with the threshold. Specially, because we should be much
more worried about its behavior when it has high values than when it has
low values. Before crossing the threshold it should be as high as possible,
and that is the "true measure" of goodness of the map. When it crosses the
threshold of 0.143, it has too low SNR, and by definition, that is a very
unstable part of the FSC, resulting in relatively unstable reports of
resolution. We made some tests about the variability of the FSC (refining
random splits of the dataset), trying to put the error bars that Steve was
asking for, and it turned out to be pretty reproducible (rather low
variance except in the region when it crosses the threshold) as long as the
dataset was large enough (which is the current state).
- @Marin, I always suffer with your reference to sloppy statistics. If we
take your paper of 2005 where the 1/2 bit criterion was proposed (
https://www.sciencedirect.com/science/article/pii/S1047847705001292),
Eqs. 4 to 15 have completely ignored the fact that you are dealing with
Fourier components, that are complex numbers, and consequently you have to
deal with random variables that have two components, which moreover the
real and imaginary part are not independent and, in their turn, they are
not independent of the nearby Fourier coefficients so that for computing
radial averages you would need to account for the correlation among
coefficients (
https://www.aimspress.com/fileOther/PDF/biophysics/20150102.pdf). For
properly dealing the statistics, at least one needs to carry out a
two-dimensional reasoning, including the complex conjugate multiplication
which is all missing in your derivation, rather than treating everything as
one-dimensional, real valued random variables. Additionally, embedded in
your whole reasoning is the idea that the expected value of a ratio is the
ratio of the expected values, that is a 0-th order Taylor approximation of
the mean of the distribution of a ratio between two random variables.
Finally, I always find an extreme difficulty to understand the 1 bit or 1/2
bit criteria, that is, what is the relationship between the channel's
capacity formula of Shannon (
https://en.wikipedia.org/wiki/Shannon%E2%80%93Hartley_theorem) and our
FSC (we do not have any channel through which we are "transmitting" our
volume, although it is true we have a model y=x+n that is the same as in
signal transmission, it is not true that the average information of a
signal is log2(1+SNR); for me, the only relationship is that the SNR
appears in both formulas, FSC and channel capacity, but that does not
automatically make them comparable and interchangeble). This is not a
criticism on your work. I think the FSC is a very useful tool to measure
some properties of the reconstruction process and the quality of the
dataset (not everything is measured by the FSC) and it also has its
drawbacks (for instance, systematic errors are rewarded by the FSC as they
are reproducible in both halves). Moreover, I think you are an extremely
intelligent person, who I consider a good friend, with a very good
intuition about image processing and who has brought very interesting ideas
and methodologies into the field. Only that we cannot become crazy about
the FSC threshold and the reported resolution, as the most interesting part
of the FSC is not when it is low, but when it is high.
I hope I can keep refraining myself in the future :-)
Cheers, Carlos Oscar
On 2/21/20 6:19 PM, Ludtke, Steven J. wrote:
I've been steadfastly refusing to get myself dragged in this time, but
with this very sensible statement (which I am largely in agreement with), I
thought I'd throw in one thought, just to stir the pot a little more.
This is not a new idea, but I think it is the most sensible strategy I've
heard proposed, and addresses Marin's concerns in a more conventional way.
What we are talking about here is the statistical noise present in the FSC
curves themselves. Viewed from the framework of traditional error analysis
and propagation of uncertainties, which pretty much every scientist should
be familiar with since high-school, (and thus would not be confusing to the
non statisticians) the 'correct' solution to this issue is not to adjust
the threshold, but to present FSC curves with error bars.
One can then use a fixed threshold at a level based on expectation values,
and simply produce a resolution value which also has an associated
uncertainty. This is much better than using a variable threshold and still
producing a single number with no uncertainty estimate! Not only does this
approach account for the statistical noise in the FSC curve, but it also
should stop people from reporting resolutions as 2.3397 Å, as it would be
silly to say 2.3397 +- 0.2.
The cross terms are not ignored, but are used in the production of the
error bars. This is a very simple approach, which is certainly closer to
being correct than the fixed threshold without error-bars approach, and it
solves many of the issues we have with resolution reporting people do. Of
course we still have people who will insist that 3.2+-0.2 is better than
3.3+-0.2, but there isn't much you can do about them... (other than beat
them over the head with a statistics textbook).
The caveat, of course, is that like all propagation of uncertainty that it
is a linear approximation, and the correlation axis isn't linear, so the
typical Normal distributions with linear propagation used to justify
propagation of uncertainty aren't _strictly_ true. However, the
approximation is fine as long as the error bars are reasonably small
compared to the -1 to 1 range of the correlation axis. Each individual
error bar is computed around its expectation value, so the overall
nonlinearity of the correlation isn't a concern.
--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <slud...@bcm.edu> Baylor
College of Medicine
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology (
www.bcm.edu/biochem)
Academic Director, CryoEM Core (
cryoem.bcm.edu)
Co-Director CIBR Center (
www.bcm.edu/research/cibr)
On Feb 21, 2020, at 10:34 AM, Alexis Rohou <a.ro...@gmail.com> wrote:
****CAUTION:*** This email is not from a BCM Source. Only click links or
open attachments you know are safe.*
------------------------------
Hi all,
For those bewildered by Marin's insistence that everyone's been messing up
their stats since the bronze age, I'd like to offer what my understanding
of the situation. More details in this thread from a few years ago on the
exact same topic:
https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003939.html
<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_pipermail_3dem_2015-2DAugust_003939.html&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=CZ3YcAV1LVKXsLT0KjCIRby6j3XPA6GqZcOVP3nMyK0&e=>
https://mail.ncmir.ucsd.edu/pipermail/3dem/2015-August/003944.html
<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_pipermail_3dem_2015-2DAugust_003944.html&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=oG6lGnei74jC5VVGsfFAdiTpIxrZhs_IH2mH0re5QRM&e=>
Notwithstanding notational problems (e.g. strict equations as opposed to
approximation symbols, or omission of symbols to denote estimation), I
believe Frank & Al-Ali and "descendent" papers (e.g. appendix of Rosenthal
& Henderson 2003) are fine. The cross terms that Marin is agitated about
indeed do in fact have an expectation value of 0.0 (in the ensemble; if the
experiment were performed an infinite number of times with different
realizations of noise). I don't believe Pawel or Jose Maria or any of the
other authors really believe that the cross-terms are orthogonal.
When N (the number of independent Fouier voxels in a shell) is large
enough, mean(Signal x Noise) ~ 0.0 is only an approximation, but a pretty
good one, even for a single FSC experiment. This is why, in my book,
derivations that depend on Frank & Al-Ali are OK, under the strict
assumption that N is large. Numerically, this becomes apparent when Marin's
half-bit criterion is plotted - asymptotically it has the same behavior as
a constant threshold.
So, is Marin wrong to worry about this? No, I don't think so. There are
indeed cases where the assumption of large N is broken. And under those
circumstances, any fixed threshold (0.143, 0.5, whatever) is dangerous.
This is illustrated in figures of van Heel & Schatz (2005). Small boxes,
high-symmetry, small objects in large boxes, and a number of other
conditions can make fixed thresholds dangerous.
It would indeed be better to use a non-fixed threshold. So why am I not
using the 1/2-bit criterion in my own work? While numerically it behaves
well at most resolution ranges, I was not convinced by Marin's derivation
in 2005. Philosophically though, I think he's right - we should aim for FSC
thresholds that are more robust to the kinds of edge cases mentioned above.
It would be the right thing to do.
Hope this helps,
Alexis
On Sun, Feb 16, 2020 at 9:00 AM Penczek, Pawel A <
pawel.a.penc...@uth.tmc.edu> wrote:
Marin,
The statistics in 2010 review is fine. You may disagree with assumptions,
but I can assure you the “statistics” (as you call it) is fine. Careful
reading of the paper would reveal to you this much.
Regards,
Pawel
On Feb 16, 2020, at 10:38 AM, Marin van Heel <
marin.vanh...@googlemail.com> wrote:
***** EXTERNAL EMAIL *****
Dear Pawel and All others ....
This 2010 review is - unfortunately - largely based on the flawed
statistics I mentioned before, namely on the a priori assumption that the
inner product of a signal vector and a noise vector are ZERO (an
orthogonality assumption). The (Frank & Al-Ali 1975) paper we have refuted
on a number of occasions (for example in 2005, and most recently in our
BioRxiv paper) but you still take that as the correct relation between SNR
and FRC (and you never cite the criticism...).
Sorry
Marin
On Thu, Feb 13, 2020 at 10:42 AM Penczek, Pawel A <
pawel.a.penc...@uth.tmc.edu> wrote:
Dear Teige,
I am wondering whether you are familiar with
Resolution measures in molecular electron microscopy.
Penczek PA. Methods Enzymol. 2010.
Citation
Methods Enzymol. 2010;482:73-100. doi: 10.1016/S0076-6879(10)82003-8.
You will find there answers to all questions you asked and much more.
Regards,
Pawel Penczek
Regards,
Pawel
_______________________________________________
3dem mailing list
3...@ncmir.ucsd.edu
https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=yEYHb4SF2vvMq3W-iluu41LlHcFadz4Ekzr3_bT4-qI&m=3-TZcohYbZGHCQ7azF9_fgEJmssbBksaI7ESb0VIk1Y&s=XHMq9Q6Zwa69NL8kzFbmaLmZA9M33U01tBE6iAtQ140&e=>
_______________________________________________
3dem mailing list
3...@ncmir.ucsd.edu
https://mail.ncmir.ucsd.edu/mailman/listinfo/3dem
<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=TeEhUNYC5v59HGWMrPQCMaGK5opuX-NIG2mJvGLuiKA&e=>
_______________________________________________
3dem mailing list
3...@ncmir.ucsd.edu
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.ncmir.ucsd.edu_mailman_listinfo_3dem&d=DwICAg&c=ZQs-KZ8oxEw0p81sqgiaRA&r=Dk5VoQQ-wINYVssLMZihyC5Dj_sWYKxCyKz9E4Lp3gc&m=UWn2RUCMENrXjn3JLSwlIU6Zmp_JYnRrXesjtsM1u2E&s=TeEhUNYC5v59HGWMrPQCMaGK5opuX-NIG2mJvGLuiKA&e=
_______________________________________________
3dem mailing
list3...@ncmir.ucsd.eduhttps://mail.ncmir.ucsd.edu/mailman/listinfo/3dem