On 24/09/14 17:31, Mohan Radhakrishnan wrote:
Hi,
I have streaming data(1 TB) that can't fit in memory. Is there a
way for me to find the median of these streaming integers assuming I can
fit only a small part in memory ? This is about the statistical approach to
find the median of a large number of values when I can inspect only a part
of them due to memory constraints.
You cannot, I'm pretty sure, calculate the median recursively. However
there are "approximate" recursive median algorithms which provide an
estimate of location that has the same asymptotic properties as the median.
See:
* U. Holst, Recursive estimators of location. Commun. Statist. Theory
Meth., vol. 16, 1987, pp. 2201--2226.
and
* Murray A. Cameron and T. Rolf Turner, Recursive location and scale
estimators, Commun. Statist. Theory Meth., vol. 22, 1993,
pp. 2503--2515.
cheers,
Rolf Turner
--
Rolf Turner
Technical Editor ANZJS
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.