Hi all, sorry for the cross-post,

I am using the aggregate variance method to analyse a time
series. This method has the following steps:
a) Calculate the mean of the entire data set, <C>
b) Divide the data set into N bins of width T
c) Calculate the mean of each bin, <Ci> for i=1,2,..., N
d) Sum over all bins {(<Ci> - <C>)^2}
e) Take the square root, divide by N
f) Repeat for next N

You end up with a list of T values and corresponding "aggregated
variance" values. Plot log T against the log of this variance gives  
a line with gradient m or gamma, which is always 0 or negative. This 
can be used as an estimator for other constants such as the Hurst
exponent, eg. gamma is 2H - 2, and it is regarded as an indicator of
self-similarity or fractality in the data.

My problem is this: I have a data set with a lot of gaps or holes in 
the data. In descriptions of the method, I have not discovered what
you are supposed to do if there are data gaps. I can think of five 
possible techniques:

(1) Keep the timespan of bins constant, so a bin is always of width T
and starting at point T*i in the data. If a bin has width T but fewer
than T values (because of data holes), simply calculate the mean from
these fewer values. 

(2) Keep the timespan of bins constant, so a bin is always of width T
and starting at point T*i in the data. If a bin has width T but fewer
than T values, throw this bin away. i.e. discard any bin from which data
is missing, and start the next bin at time T*(i+1).

(3) Keep the timespan of bins constant, so a bin is always of width
T. If a bin has width T but fewer than T values, throw this bin away.
As soon as there is valid data, start the next bin. Thus bin i will not
necessarily start at time point T*i.

(4) Divide the bins such that each has a fixed number of valid data
points, T, but may have variable width greater than or equal to
T. Thus bin i will not necessarily start at time point T*i. Only
continue to fill a bin if there is a small gap between valid data
points, less than or equal to some constant k.

(5) Divide the bins such that each has a fixed number of valid data
points, T, but may have variable width greater than or equal to
T. Thus bin i will not necessarily start at time point T*i. Continue 
to fill a bin even if there is a long gap between valid data points.

Which of these techniques is correct? If you can give a citation of a
book or research paper to back up your answer, this would be
wonderful.

I am also now having a minor crisis about step (e) above, if I really
am supposed to take the square root method from did so, but I can't  
seem to find this confirmed by any formulae online. However I admit to
not being very good at maths. Am I calculating the standard deviation
rather than the variance? And is my method wrong?

Please help me if you possibly can, and in return you will get my
undying gratitude, a drink of your choice if we're ever in the same
part of the world, and an acknowledgement in my thesis, if you want it.

helen-louise
[EMAIL PROTECTED]
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to