I meant the papers. I hit a paywall. Can we reconstruct the code from the papers ?
Thanks, Mohan On Wed, Sep 24, 2014 at 3:59 PM, Martyn Byng <martyn.b...@nag.co.uk> wrote: > Something else that might be of interest ... > > Zhang Q and Wang W (2007) A fast algorithm for approximate quantiles in > high speed data streams Proceedings of the 19th International Conference on > Scientific and Statistical Database Management IEEE Computer Society 29 > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf Of Martin Maechler > Sent: 24 September 2014 09:17 > To: Rolf Turner > Cc: R-help@r-project.org; r-sig-rob...@r-project.org > Subject: Re: [R] Median of streaming data > > >>>>> Rolf Turner <r.tur...@auckland.ac.nz> > >>>>> on Wed, 24 Sep 2014 18:43:34 +1200 writes: > > > On 24/09/14 17:31, Mohan Radhakrishnan wrote: > >> Hi, > >> > >> I have streaming data(1 TB) that can't fit in memory. Is > >> there a way for me to find the median of these streaming > >> integers assuming I can fit only a small part in memory ? > >> This is about the statistical approach to find the median > >> of a large number of values when I can inspect only a > >> part of them due to memory constraints. > > > You cannot, I'm pretty sure, calculate the median > > recursively. However there are "approximate" recursive > > median algorithms which provide an estimate of location > > that has the same asymptotic properties as the median. > > > See: > > > * U. Holst, Recursive estimators of location. > > Commun. Statist. Theory Meth., vol. 16, 1987, > > pp. 2201--2226. > > > and > > > * Murray A. Cameron and T. Rolf Turner, Recursive location > > and scale estimators, Commun. Statist. Theory Meth., > > vol. 22, 1993, pp. 2503--2515. > > This is really interesting to me, thank you, Rolf! > > OTOH, > > 1) has your proposal ever been provided in R? > I'd be happy to add it to the robustX > (http://cran.ch.r-project.org/web/packages/robustX) or even > robustbase (http://cran.ch.r-project.org/web/packages/robustbase) > package. > > 2) Would anybody know of more recent research on the subject? > (I quickly "googled around" and found research more geared > for the time series situation which is more involved anyway) > > --> Hence CC'ing the experts' list R-SIG-robust > > > Martin Maechler, ETH Zurich > > > > cheers, > > Rolf Turner > > > -- > > Rolf Turner Technical Editor ANZJS > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ________________________________________________________________________ > This e-mail has been scanned for all viruses by Star.\ _...{{dropped:3}} > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.