On 24/09/14 17:31, Mohan Radhakrishnan wrote:
Hi,

          I have streaming data(1 TB) that can't fit in memory. Is there a
way for me to find the median of these streaming integers assuming I can
fit only a small part in memory ? This is about the statistical approach to
find the median of a large number of values when I can inspect only a part
of them due to memory constraints.

You cannot, I'm pretty sure, calculate the median recursively. However there are "approximate" recursive median algorithms which provide an estimate of location that has the same asymptotic properties as the median.

See:

* U. Holst, Recursive estimators of location. Commun. Statist. Theory Meth., vol. 16, 1987, pp. 2201--2226.

and

* Murray A. Cameron and T. Rolf Turner, Recursive location and scale estimators, Commun. Statist. Theory Meth., vol. 22, 1993,
pp. 2503--2515.

cheers,

Rolf Turner

--
Rolf Turner
Technical Editor ANZJS

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to