On 08/03/2009 12:08 PM, Barry Rowlingson wrote:
I think the situation is worse than messy.  If a client comes in with data
that doesn't address the question they're interested in, I think they are
better served to be told that, than to be given an answer that is not
actually valid.  They should also be told how to design a study that
actually does address their question.

You (and others) have mentioned Google Analytics as a possible way to
address the quality of data; that's helpful.  But analyzing bad data will
just give bad conclusions.

 As long as we say 'package Foo is the most downloaded package on
CRAN', and not 'package Foo is the most used package for R', we can
leave it to the user to decide if the latter conclusion follows from
the former.

But we don't even have that data, since CRAN is distributed across lots of mirrors.

Duncan Murdoch

 In the absence of actual usage data I would think it a
good approximation. Not that I would risk my life on it.

 Pop music charts are now based on download counts, but I wouldn't
believe they represent the songs that are listened to the most times.
Nor would I go so far as to believe they represent the quality of the
songs...

 Should R have a 'Would you like to tell CRAN every time you do
library(foo) so we can do usage counts (no personal data is
transmitted blah blah) ?'? I don't think so....

Barry

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to