I am looking for a way to characterize a set of data - each set consists
of many thousands of data points spanning a wide range over three
sometimes four orders of magnitudes. This analysis will then be used
by others to examine their own experiments - will allow them to compare
their results with ours - Are they similar? Different? with a
confidence level of 95%
Excuse my ignorance - I have calculated many Student's T Test,
p values - but have never handled so much data as I am now!
Assume for the purpose of this discussion that we have a data set
defined by X(I,J) where I = 1,3000 and J=1,10
(J will therefore be the different data sets, I will refer to
points within each data set)
Calculate the mean and the standard deviation at each point -
i.e. calculate average and standard deviation at each I for
J from 1 to 10
Now if I (or someone else)
do an eleventh experiment, i.e. J = 11, how will I know
with some confidence that this 11th experiment is
"similar" to the 1st 10 experiments? Is it similar (95 %) if
the value at each I falls within 2 standard deviations of the
mean for that I? (I am making the assumption that the errors at
each point are truly random i.e. normally distributed)
Information, books, articles that will lead me to answers to
such questions will also be helpful -