Donald F. Burrill <[EMAIL PROTECTED]> wrote: > Preliminary response:
> On 22 Nov 1999 [EMAIL PROTECTED] wrote: >> I am looking for a way to characterize a set of data - each set consists >> of many thousands of data points spanning a wide range over three >> sometimes four orders of magnitudes. > This leads one to wonder whether the original variables, or their > logarithms, would be the more appropriate metric for descriptive purposes. > What do the distributions look like? >> This analysis will then be used by others to examine their own >> experiments - will allow them to compare their results with ours - Are >> they similar? Different? with a confidence level of 95% >> Excuse my ignorance - I have calculated many Student's T Test, >> p values - but have never handled so much data as I am now! >> >> Assume for the purpose of this discussion that we have a data set >> defined by X(I,J) where I = 1,3000 and J=1,10 >> >> (J will therefore be the different data sets, I will refer to >> points within each data set) >> >> Calculate the mean and the standard deviation at each point - >> i.e. calculate average and standard deviation at each I for >> J from 1 to 10 > This would imply that the only characteristics of interest are > the mean and s.d. Is that really so? (If the 10 distributions are all > (approximately) Gaussian (aka "normal"), these are all you need; but if > they are not, rather more descriptive information is probably needed. > As remarked above, the fact that your range extends over several orders > of magnitude seems to suggest that the distributions probably are NOT > Gaussian. >> Now if I (or someone else) do an eleventh experiment, i.e. J = 11, how >> will I know with some confidence that this 11th experiment is "similar" >> to the 1st 10 experiments? Is it similar (95 %) if the value at each I >> falls within 2 standard deviations of the mean for that I? (I am >> making the assumption that the errors at each point are truly random >> i.e. normally distributed) > I don't see how one could tell in advance; so I'll rephrase the question > to "how can I tell whether this 11th experiment was "similar"...?" > Sounds to me as though you'd want to test the formal hypothesis that the > 11th mean is equal to the mean of the previous 10 experiments. > On reflection, I see that I've been assuming that each of your 10 data > sets contains univariate values whose distribution is of interest; but > your description is also consistent with having multivariate values, or > (what is not quite the same thing) having a clutch of subsets, each of > which is of interest for its own distribution (or parameters thereof). > If those several orders of magnitude arise from several systematic > differences within each data set, then a less simple-minded approach than > what I've outlined above would surely be called for. > (In particular, if you're dealing with some industrial chemical process, > there may be different temporal regimes to be dealt with: startup, > coming to equilibrium, equilibrium, shutdown, for (surely over-simple) > example.) > -- DFB. > ------------------------------------------------------------------------ > Donald F. Burrill [EMAIL PROTECTED] > 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] > MSC #29, Plymouth, NH 03264 603-535-2597 > 184 Nashua Road, Bedford, NH 03110 603-471-7128 -- Snail Mail: E-Mail: Krishnan K. Chittur [EMAIL PROTECTED] <--- Chemical Engineering Dept. http://www.eb.uah.edu/che/ Univ. of Alabama in Huntsville http://www.eb.uah.edu/~kchittur Huntsville, AL 35899 (205) 890 6850 (Voice) (205) 890 6839 (FAX-Chemical Engineering) or (205) 890 6349 (FAX-Chemistry) "Education is a process in which one loses one's fear of what one does not know" Steven A. Rosenberg, Surgeon/Scientist Do something good for yourself. Be a ChemE. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
