Hi Ted, Coming to think about your direction - another idea came to mind: The next time a major release is made (there is one scheduled quite soon actually), the core team could add a "survey" on the downloading page of the R base package asking for just one question "please click here if this is the first computer you are downloading this package for". This, combined with the fact that when serving a user we can obtain his IP address (which gives geo information) could give a pretty nice rough estimate of how many "major release downloaders" the R community has.
Tal On Sun, Mar 8, 2009 at 6:11 PM, Ted Harding <ted.hard...@manchester.ac.uk>wrote: > On 08-Mar-09 15:14:03, Duncan Murdoch wrote: > > On 08/03/2009 10:49 AM, hadley wickham wrote: > >>> More seriously : I don't think relative numbers of package downloads > >>> can be interpreted in any reasonable way, because reasons for > >>> package download have a very wide range from curiosity ("what's > >>> this ?"), fun (think "fortunes"...), to vital need tthink lme4 > >>> if/when a consensus on denominator DFs can be reached :-)...). > >>> What can you infer in good faith from such a mess ? > >> > >> So when we have messy data with measurement error, we should just > >> give up? Doesn't sound very statistical! ;) > > > > I think the situation is worse than messy. If a client comes in with > > data that doesn't address the question they're interested in, I think > > they are better served to be told that, than to be given an answer that > > is not actually valid. They should also be told how to design a study > > that actually does address their question. > > > > You (and others) have mentioned Google Analytics as a possible way to > > address the quality of data; that's helpful. But analyzing bad data > > will just give bad conclusions. > > Duncan Murdoch > > The population of R users (which we would need to sample in order > to obtain good data) is probably more elusive than a fish population > in the ocean -- only partially visible at best, and with an unknown > proportion invisible. > > At least in Fisheries research, there are long established capture > techniques (from trawling to netting to electro-fishing to ... ) > which can be deployed, for research purposes, in such a way as to > potentially reach all members of a target population, with at least > a moderately good approximation to random sampling. What have we > for R? > > Come to think of it, electro-fishing, ... > > Suppose R were released with 2 types of cookie embedded in base R. > Each type is randomly configured, when R is first run, to be Active > or Inactive (probability of activation to be decided at the design > stage ... ). Type 1, if active, on a certain date generates an > event which brings it to the notice of R-Core (e.g. by clandestine > email or by inducing a bug report). Type 2 acts similarly on a later > date. If Type 2 acts, it carries with it information as to whether > there was a Type 1 action along with whether, apparently, the Type 1 > action "succeeded". > > We then have, in effect, an analogue of the Mark-Recapture technique > of population estimation (along with the usual questions about > equal catchability and so forth). > > However, since this sort of thing (which I am not proposing seriously, > only for the sake of argument) is undoubtedly unethical (and would > do R's reputation no good if it came to light), I tentatively conclude > that the population of R users is likely to remain as elusive as ever. > > Best wishes to all, > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> > Fax-to-email: +44 (0)870 094 0861 > Date: 08-Mar-09 Time: 16:11:44 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: www.talgalili.com www.biostatistics.co.il [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.