> I think the situation is worse than messy.  If a client comes in with data
> that doesn't address the question they're interested in, I think they are
> better served to be told that, than to be given an answer that is not
> actually valid.  They should also be told how to design a study that
> actually does address their question.
>
> You (and others) have mentioned Google Analytics as a possible way to
> address the quality of data; that's helpful.  But analyzing bad data will
> just give bad conclusions.

 As long as we say 'package Foo is the most downloaded package on
CRAN', and not 'package Foo is the most used package for R', we can
leave it to the user to decide if the latter conclusion follows from
the former. In the absence of actual usage data I would think it a
good approximation. Not that I would risk my life on it.

 Pop music charts are now based on download counts, but I wouldn't
believe they represent the songs that are listened to the most times.
Nor would I go so far as to believe they represent the quality of the
songs...

 Should R have a 'Would you like to tell CRAN every time you do
library(foo) so we can do usage counts (no personal data is
transmitted blah blah) ?'? I don't think so....

Barry

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to