Hi Ted,

Coming to think about your direction - another idea came to mind:
The next time a major release is made (there is one scheduled quite soon
actually), the core team could add a "survey" on the downloading page of the
R base package asking for just one question
"please click here if this is the first computer you are downloading this
package for".
This, combined with the fact that when serving a user we can obtain his IP
address (which gives geo information) could give a pretty nice rough
estimate of how many "major release downloaders" the R community has.



Tal








On Sun, Mar 8, 2009 at 6:11 PM, Ted Harding <ted.hard...@manchester.ac.uk>wrote:

> On 08-Mar-09 15:14:03, Duncan Murdoch wrote:
> > On 08/03/2009 10:49 AM, hadley wickham wrote:
> >>> More seriously : I don't think relative numbers of package downloads
> >>> can be interpreted in any reasonable way, because reasons for
> >>> package download have a very wide range from curiosity ("what's
> >>> this ?"), fun (think "fortunes"...), to vital need tthink lme4
> >>> if/when a consensus on denominator DFs can be reached :-)...).
> >>> What can you infer in good faith from such a mess ?
> >>
> >> So when we have messy data with measurement error, we should just
> >> give up?  Doesn't sound very statistical! ;)
> >
> > I think the situation is worse than messy.  If a client comes in with
> > data that doesn't address the question they're interested in, I think
> > they are better served to be told that, than to be given an answer that
> > is not actually valid.  They should also be told how to design a study
> > that actually does address their question.
> >
> > You (and others) have mentioned Google Analytics as a possible way to
> > address the quality of data; that's helpful.  But analyzing bad data
> > will just give bad conclusions.
> > Duncan Murdoch
>
> The population of R users (which we would need to sample in order
> to obtain good data) is probably more elusive than a fish population
> in the ocean -- only partially visible at best, and with an unknown
> proportion invisible.
>
> At least in Fisheries research, there are long established capture
> techniques (from trawling to netting to electro-fishing to ... )
> which can be deployed, for research purposes, in such a way as to
> potentially reach all members of a target population, with at least
> a moderately good approximation to random sampling. What have we
> for R?
>
> Come to think of it, electro-fishing, ...
>
> Suppose R were released with 2 types of cookie embedded in base R.
> Each type is randomly configured, when R is first run, to be Active
> or Inactive (probability of activation to be decided at the design
> stage ... ). Type 1, if active, on a certain date generates an
> event which brings it to the notice of R-Core (e.g. by clandestine
> email or by inducing a bug report). Type 2 acts similarly on a later
> date. If Type 2 acts, it carries with it information as to whether
> there was a Type 1 action along with whether, apparently, the Type 1
> action "succeeded".
>
> We then have, in effect, an analogue of the Mark-Recapture technique
> of population estimation (along with the usual questions about
> equal catchability and so forth).
>
> However, since this sort of thing (which I am not proposing seriously,
> only for the sake of argument) is undoubtedly unethical (and would
> do R's reputation no good if it came to light), I tentatively conclude
> that the population of R users is likely to remain as elusive as ever.
>
> Best wishes to all,
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 08-Mar-09                                       Time: 16:11:44
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to