Re: [ccp4bb] Continuous-Single Versus Coarse-Multiple Sampling

David Waterman Fri, 23 Jan 2015 14:09:13 -0800

Hi Jacob,

My intuition for the line fit case was exactly the opposite to yours. My
reasoning is a sort of physical one. If you imagine the line as a stiff rod
with hooks for masses evenly spread along its length at 10 positions, then
the object where you put 5 masses at positions 1 and 10 has a greater
moment of inertia than the case where there is one mass at each position.
This tells me that changes to the masses in the further spread case would
be more effective (faster) at changing the orientation of the rod. Then
it's a bit of leap I admit, but I felt that measurements of a line's height
made in such a way would be more effective at determining the fit
parameters than the evenly spread case.


But I decided not to rely on intuition when I could simulate it. At the
bottom of the message is a script in the R language that does 20000 line
fits of the function y = m*x + c, where c = 100, m = 1 and x is the
sequence 1..10 for the first 10000 fits and (1, 1, 1, 1, 1, 10, 10, 10, 10,
10) for the rest of them. The data being fit to are 'measurements' taken by
adding a standard normal deviate to x + 100 at each position.

After running all these simulations (it takes a few seconds) the script
then calculates the mean and standard deviations of the fit parameters, m
and c. The standard deviations are interesting:

mean intercept of fit1: 99.99619
sd of intercept of fit1 0.6833687

mean intercept of fit2: 99.99909
sd of intercept of fit2 0.498759

mean gradient of fit1: 1.00076
sd of gradient of fit1 0.1100296

mean gradient of fit2: 0.9996967
sd of gradient of fit2 0.07021113

Fit 2 has a tighter distribution for both the intercept and the gradient.
It is therefore the more precise way of fitting the line, and this was the
'heavy-ended' case.

Script follows:

line_fit <- function(x_seq)
{
  y_seq <- 100 + x_seq + rnorm(length(x_seq))
  return(coef(lm(y_seq~x_seq)))
}

even_spaced_x <- seq(1,10)
heavy_ended_x <- c(rep(1,5), rep(10,5))

fit1 <- replicate(10000, line_fit(even_spaced_x))
fit2 <- replicate(10000, line_fit(heavy_ended_x))

cat("mean intercept of fit1:", mean(fit1[1,]), "\n")
cat("sd of intercept of fit1", sd(fit1[1,]), "\n\n")

cat("mean intercept of fit2:", mean(fit2[1,]), "\n")
cat("sd of intercept of fit2", sd(fit2[1,]), "\n\n")

cat("mean gradient of fit1:", mean(fit1[2,]),"\n")
cat("sd of gradient of fit1", sd(fit1[2,]), "\n\n")

cat("mean gradient of fit2:", mean(fit2[2,]), "\n")
cat("sd of gradient of fit2", sd(fit2[2,]), "\n\n")


Cheers

-- David

On 22 January 2015 at 22:20, Keller, Jacob <[email protected]> wrote:

> Dear Crystallographers,
>
> This is more general than crystallography, but has applications therein,
> particularly in understanding fine phi-slicing.
>
> The general question is:
>
> Given one needs to collect data to fit parameters for a known function,
> and given a limited total number of measurements, is it generally better to
> measure a small group of points multiple times or to distribute each
> individual measurement over the measureable extent of the function? I have
> a strong intuition that it is the latter, but all errors being equal, it
> would seem prima facie that both are equivalent. For example, a line (y =
> mx + b) can be fit from two points. One could either measure the line at
> two points A and B five times each for a total of 10 independent
> measurements, or measure ten points evenly-spaced from A to B. Are these
> equivalent in terms of fitting and information content or not? Which is
> better? Again, conjecture and intuition suggest the evenly-spaced
> experiment is better, but I cannot formulate or prove to myself why, yet.
>
> The application of this to crystallography might be another reason that
> fine phi-slicing (0.1 degrees * 3600 frames) is better than coarse (1
> degree * 3600 frames), even though the number of times one measures
> reflections is tenfold higher in the second case (assuming no radiation
> damage). In the first case, one never measures the same phi angle twice,
> but one does have multiple measurements in a sense, i.e., of different
> parts of the same reflection.
>
> Yes, 3D profile-fitting may be a big reason fine phi-slicing works, but
> beyond that, perhaps this sampling choice plays a role as well. Or maybe
> the profile-fitting works so well precisely because of this diffuse-single
> type of sampling rather than coarse-multiple sampling?
>
> This general math/science concept must have been discussed somewhere--can
> anyone point to where?
>
> JPK
>
> *******************************************
> Jacob Pearson Keller, PhD
> Looger Lab/HHMI Janelia Research Campus
> 19700 Helix Dr, Ashburn, VA 20147
> email: [email protected]
> *******************************************
>

Re: [ccp4bb] Continuous-Single Versus Coarse-Multiple Sampling

Reply via email to