On 24/09/14 20:16, Martin Maechler wrote:
Rolf Turner <r.tur...@auckland.ac.nz>
     on Wed, 24 Sep 2014 18:43:34 +1200 writes:

     > On 24/09/14 17:31, Mohan Radhakrishnan wrote:
     >> Hi,
     >>
     >> I have streaming data(1 TB) that can't fit in memory. Is
     >> there a way for me to find the median of these streaming
     >> integers assuming I can fit only a small part in memory ?
     >> This is about the statistical approach to find the median
     >> of a large number of values when I can inspect only a
     >> part of them due to memory constraints.

     > You cannot, I'm pretty sure, calculate the median
     > recursively.  However there are "approximate" recursive
     > median algorithms which provide an estimate of location
     > that has the same asymptotic properties as the median.

     > See:

     > * U. Holst, Recursive estimators of location.
     > Commun. Statist. Theory Meth., vol. 16, 1987,
     > pp. 2201--2226.

     > and

     > * Murray A. Cameron and T. Rolf Turner, Recursive location
     > and scale estimators, Commun. Statist. Theory Meth.,
     > vol. 22, 1993, pp. 2503--2515.

This is really interesting to me, thank you, Rolf!

OTOH,

1) has your proposal ever been provided in R?
    I'd be happy to add it to the robustX
    (http://cran.ch.r-project.org/web/packages/robustX) or even
    robustbase (http://cran.ch.r-project.org/web/packages/robustbase) package.

I coded the stuff up when Murray and I wrote the paper referred to, but I'm not sure if I'll be able to find the code now. I think I was probably using Splus r.t. R at the time.

I'll have a bit of a search in my ancient archives and see what I can find.

2) Would anybody know of more recent research on the subject?
    (I quickly "googled around" and found research more geared
     for the time series situation which is more involved anyway)

    --> Hence CC'ing the experts' list  R-SIG-robust

I don't know of anything more recent. Some sort of citation search might help to turn up things that are out there.

cheers,

Rolf


--
Rolf Turner
Technical Editor ANZJS

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to