Re: [R] Why software fails in scientific research

2010-07-01 Thread Jim Lemon

On 07/01/2010 03:29 AM, Dr. David Kirkby wrote:

On 03/ 1/10 12:23 AM, Sharpie wrote:



John Maindonald wrote:


I came across this notice of an upcoming webinar. The issues identified
in the
first paragraph below seem to me exactly those that the R project is
designed
to address. The claim that most research software is barely fit for
purpose
compared to equivalent systems in the commercial world seems to me not
quite accurate! Comments!


It can be argued that this is a reporting bias. Whenever I inform people 
doing epidemiology with Excel about Ian Buchan's paper on Excel errors:


http://www.nwpho.org.uk/sadb/Poisson%20CI%20in%20spreadsheets.pdf

there is a sort of reflexive disbelief, as though something as widely 
used as Excel could not possibly be wrong. That is to say, most people 
using commercial software, especially the sort that allows them to 
follow a cookbook method and get an acceptable (to supervisors, journal 
editors and paymasters) result simply accept it without question.


The counterweight to the carefree programming style employed by many 
researchers (I include myself) is the multitude of enquiring eyes that 
find our mistakes, and foster a continual refinement of our programs. I 
just received one this evening, about yet another thing that I had never 
considered, perfect agreement by rating methods in a large trial. Thus 
humanity bootstraps upward. My AUD0.02


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why software fails in scientific research

2010-07-01 Thread Gavin Simpson
On Wed, 2010-06-30 at 11:17 -0700, Bert Gunter wrote:
 Just one small additional note below ...
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
  
 
 But a lot of academics are not going to waste their time documenting code
 
 properly, so others can reap the benefits of it. They would rather get on
 with 
 the next project, to get the next paper. 
 
 
 -- Indeed. My personal experience over 3 decades in industrial (private)
 research is that data analysis is viewed as relatively
 unimportant/straightforward/pedestrian and is left to technicians (or
 postdocs) -- often with what is done being largely dictated by the
 conventions of a particular journal or discipline. The lab heads and
 research directors are responsible for the grand research strategies,
 managing resources, etc. and don't want to waste much time on something that
 routine. So worrying about reproducibility of data analysis code (if there
 is any, given the use of GUI software like Excel) falls beneath their radar.
 
 Clearly there are disciplines (e.g. ecology?) where this may NOT be the
 case.

If ecology is anything to go by (and I am an ecologist, sort of, just
about), there is a large body of the community doing things because i)
that is how they've always been done, or ii) because that's what
reviewers/editors expect etc. with a much smaller group of researchers
pushing at the boundaries (of their field) to use techniques
statisticians and the like have been using for a very long time.

Reproducible research is still very much in the (very, very) small
minority of the work I come across reviewing papers etc. But I am
encouraged by the number of people I know who are starting to use tools
like R to conduct their research.

 -- Bert

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why software fails in scientific research

2010-07-01 Thread Murray M Cooper, PhD

For what its worth!

A good friend who also happens to be an ecologist
told me An ecologist is a statistician who likes to be
outside.

Murray M Cooper, Phd
Richland Statistics

- Original Message - 
From: Gavin Simpson gavin.simp...@ucl.ac.uk

To: Bert Gunter gunter.ber...@gene.com
Cc: r-help@r-project.org
Sent: Thursday, July 01, 2010 11:57 AM
Subject: Re: [R] Why software fails in scientific research



On Wed, 2010-06-30 at 11:17 -0700, Bert Gunter wrote:

Just one small additional note below ...

Bert Gunter
Genentech Nonclinical Biostatistics


But a lot of academics are not going to waste their time documenting 
code


properly, so others can reap the benefits of it. They would rather get on
with
the next project, to get the next paper. 


-- Indeed. My personal experience over 3 decades in industrial (private)
research is that data analysis is viewed as relatively
unimportant/straightforward/pedestrian and is left to technicians (or
postdocs) -- often with what is done being largely dictated by the
conventions of a particular journal or discipline. The lab heads and
research directors are responsible for the grand research strategies,
managing resources, etc. and don't want to waste much time on something 
that
routine. So worrying about reproducibility of data analysis code (if 
there
is any, given the use of GUI software like Excel) falls beneath their 
radar.


Clearly there are disciplines (e.g. ecology?) where this may NOT be the
case.


If ecology is anything to go by (and I am an ecologist, sort of, just
about), there is a large body of the community doing things because i)
that is how they've always been done, or ii) because that's what
reviewers/editors expect etc. with a much smaller group of researchers
pushing at the boundaries (of their field) to use techniques
statisticians and the like have been using for a very long time.

Reproducible research is still very much in the (very, very) small
minority of the work I come across reviewing papers etc. But I am
encouraged by the number of people I know who are starting to use tools
like R to conduct their research.


-- Bert


G

--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why software fails in scientific research

2010-07-01 Thread Thomas Adams

OK…

My Grandfather, who was a farmer, was outstanding in his field…

Cheers…

Murray M Cooper, PhD wrote:

For what its worth!

A good friend who also happens to be an ecologist
told me An ecologist is a statistician who likes to be
outside.

Murray M Cooper, Phd
Richland Statistics

- Original Message - From: Gavin Simpson 
gavin.simp...@ucl.ac.uk

To: Bert Gunter gunter.ber...@gene.com
Cc: r-help@r-project.org
Sent: Thursday, July 01, 2010 11:57 AM
Subject: Re: [R] Why software fails in scientific research



On Wed, 2010-06-30 at 11:17 -0700, Bert Gunter wrote:

Just one small additional note below ...

Bert Gunter
Genentech Nonclinical Biostatistics


But a lot of academics are not going to waste their time 
documenting code


properly, so others can reap the benefits of it. They would rather 
get on

with
the next project, to get the next paper. 


-- Indeed. My personal experience over 3 decades in industrial 
(private)

research is that data analysis is viewed as relatively
unimportant/straightforward/pedestrian and is left to technicians (or
postdocs) -- often with what is done being largely dictated by the
conventions of a particular journal or discipline. The lab heads and
research directors are responsible for the grand research strategies,
managing resources, etc. and don't want to waste much time on 
something that
routine. So worrying about reproducibility of data analysis code 
(if there
is any, given the use of GUI software like Excel) falls beneath 
their radar.


Clearly there are disciplines (e.g. ecology?) where this may NOT be the
case.


If ecology is anything to go by (and I am an ecologist, sort of, just
about), there is a large body of the community doing things because i)
that is how they've always been done, or ii) because that's what
reviewers/editors expect etc. with a much smaller group of researchers
pushing at the boundaries (of their field) to use techniques
statisticians and the like have been using for a very long time.

Reproducible research is still very much in the (very, very) small
minority of the work I come across reviewing papers etc. But I am
encouraged by the number of people I know who are starting to use tools
like R to conduct their research.


-- Bert


G

--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL:  thomas.ad...@noaa.gov

VOICE:  937-383-0528
FAX:937-383-0033

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why software fails in scientific research

2010-07-01 Thread steven mosher
  Thomas,

 How popular is R inside of NOAA?




On Thu, Jul 1, 2010 at 11:25 AM, Thomas Adams thomas.ad...@noaa.gov wrote:

 OK…

 My Grandfather, who was a farmer, was outstanding in his field…

 Cheers…


 Murray M Cooper, PhD wrote:

 For what its worth!

 A good friend who also happens to be an ecologist
 told me An ecologist is a statistician who likes to be
 outside.

 Murray M Cooper, Phd
 Richland Statistics

 - Original Message - From: Gavin Simpson 
 gavin.simp...@ucl.ac.uk
 To: Bert Gunter gunter.ber...@gene.com
 Cc: r-help@r-project.org
 Sent: Thursday, July 01, 2010 11:57 AM
 Subject: Re: [R] Why software fails in scientific research


  On Wed, 2010-06-30 at 11:17 -0700, Bert Gunter wrote:

 Just one small additional note below ...

 Bert Gunter
 Genentech Nonclinical Biostatistics


 But a lot of academics are not going to waste their time documenting
 code

 properly, so others can reap the benefits of it. They would rather get
 on
 with
 the next project, to get the next paper. 


 -- Indeed. My personal experience over 3 decades in industrial (private)
 research is that data analysis is viewed as relatively
 unimportant/straightforward/pedestrian and is left to technicians (or
 postdocs) -- often with what is done being largely dictated by the
 conventions of a particular journal or discipline. The lab heads and
 research directors are responsible for the grand research strategies,
 managing resources, etc. and don't want to waste much time on something
 that
 routine. So worrying about reproducibility of data analysis code (if
 there
 is any, given the use of GUI software like Excel) falls beneath their
 radar.

 Clearly there are disciplines (e.g. ecology?) where this may NOT be the
 case.


 If ecology is anything to go by (and I am an ecologist, sort of, just
 about), there is a large body of the community doing things because i)
 that is how they've always been done, or ii) because that's what
 reviewers/editors expect etc. with a much smaller group of researchers
 pushing at the boundaries (of their field) to use techniques
 statisticians and the like have been using for a very long time.

 Reproducible research is still very much in the (very, very) small
 minority of the work I come across reviewing papers etc. But I am
 encouraged by the number of people I know who are starting to use tools
 like R to conduct their research.

  -- Bert


 G

 --
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography, [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Thomas E Adams
 National Weather Service
 Ohio River Forecast Center
 1901 South State Route 134
 Wilmington, OH 45177

 EMAIL:  thomas.ad...@noaa.gov

 VOICE:  937-383-0528
 FAX:937-383-0033


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why software fails in scientific research

2010-06-30 Thread Dr. David Kirkby

On 03/ 1/10 12:23 AM, Sharpie wrote:



John Maindonald wrote:


I came across this notice of an upcoming webinar.   The issues identified
in the
first paragraph below seem to me exactly those that the R project is
designed
to address.  The claim that most research software is barely fit for
purpose
compared to equivalent systems in the commercial world seems to me not
quite accurate!  Comments!



There's probably a lot of truth in those comments.

Generally speaking, publishing results gets rewards in terms of promotion, 
salary etc. Having your code well documented, in revision control systems does 
not. I don't think any amount of




I personally feel that a lot of this is a result of failing to publish the
code that was developed to perform research along with the results of the
research.  When setting out to do start a new project, one can dig up tons
of journal articles that will happily inform how data was gathered, what
equations were used and wrap it all up with nicely formatted tables and
graphs that show X is correlated to Y.



What these articles fail to report is the code that was developed to filter
and process the raw data and then apply the equations to produce the figures
and tables.  The next generation of researchers that are seeking to extend
the results then end up writing their own code rather than building upon
what has already been done.


But unless code is well documented, its often quicker to start from scratch 
anyway.


The R community has done a tremendous job in encouraging truly reproducible
research through the package system and tools like Sweave which provide a
means to combine and maintain data, code and reports-- but we need more.

In my opinion, we need to start seeing websites that provide services
similar to github or bitbucket-- but with a focus on scientific research.  I
should be able to set up a versioned repository somewhere in the cloud for
my research projects that hosts not only my code, but my data and reports.
I could then choose to make this resource publicly available and other
researchers could fork my work with a single mouse click and start
collaborating on my project or extend what I've done into a project of their
own.


But a lot of academics are not going to waste their time documenting code 
properly, so others can reap the benefits of it. They would rather get on with 
the next project, to get the next paper.


FTP sites have existed for years. If people want to make their data analysis 
code available, it is not hard. But I think it would need a change of attitude 
more than any technical advance.



And that's my two cents on the state of software in research.

-Charlie


And there is my two pennies!

Dave

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why software fails in scientific research

2010-06-30 Thread Bert Gunter
Just one small additional note below ...

Bert Gunter
Genentech Nonclinical Biostatistics
 

But a lot of academics are not going to waste their time documenting code

properly, so others can reap the benefits of it. They would rather get on
with 
the next project, to get the next paper. 


-- Indeed. My personal experience over 3 decades in industrial (private)
research is that data analysis is viewed as relatively
unimportant/straightforward/pedestrian and is left to technicians (or
postdocs) -- often with what is done being largely dictated by the
conventions of a particular journal or discipline. The lab heads and
research directors are responsible for the grand research strategies,
managing resources, etc. and don't want to waste much time on something that
routine. So worrying about reproducibility of data analysis code (if there
is any, given the use of GUI software like Excel) falls beneath their radar.

Clearly there are disciplines (e.g. ecology?) where this may NOT be the
case.

-- Bert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Why software fails in scientific research

2010-02-28 Thread John Maindonald
I came across this notice of an upcoming webinar.   The issues identified in 
the 
first paragraph below seem to me exactly those that the R project is designed
to address.  The claim that most research software is barely fit for purpose 
compared to equivalent systems in the commercial world seems to me not
quite accurate!  Comments!


WEBINAR SERIES 
A Crack in the Code: Why software fails in scientific research, and how to fix 
it. 
Thursday, March 25, 2010, 3:00 PM GMT 
http://physicsworld.com/cws/go/webinar9


In the 60 years since the invention of the digital computer, millions of lines 
of code have been developed to support scientific research. Although an 
increasingly important part of almost all research projects, most research 
software is barely fit for purpose compared to equivalent systems in the 
commercial world. The code is hard to understand or maintain, lacking 
documentation and version control, and is continually ‘re-invented’ as the code 
writers move on to new jobs. This represents a tremendous waste of the already 
inadequate resources that are put into its development. We will investigate how 
this situation has come about, why it is important to the future of research, 
and what can be done about it. 

Robert McGreevy will draw on his extensive experience at the STFC ISIS 
Facility, and explain how these issues are being addressed for the benefit of 
research science globally. Nicholas Draper, consultant at Tessella, will then 
expand on this, using the example of the Mantid project at ISIS. 


Tessella (www.tessella.com) is a technology and consultancy firm, based in 
Oxford.

ISIS (International Species Information System) (www.isis.org) has as its 
mission the facilitation of international collaboration in the collection and 
sharing of knowledge on animals and their environments for zoos, aquariums and 
related organizationsvalues the use of objective data to benefit conservation, 
science, animal welfare, education, and collection management.

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why software fails in scientific research

2010-02-28 Thread Sharpie


John Maindonald wrote:
 
 I came across this notice of an upcoming webinar.   The issues identified
 in the 
 first paragraph below seem to me exactly those that the R project is
 designed
 to address.  The claim that most research software is barely fit for
 purpose 
 compared to equivalent systems in the commercial world seems to me not
 quite accurate!  Comments!
 
 
 WEBINAR SERIES 
 A Crack in the Code: Why software fails in scientific research, and how to
 fix it. 
 Thursday, March 25, 2010, 3:00 PM GMT 
 http://physicsworld.com/cws/go/webinar9
 
 
 In the 60 years since the invention of the digital computer, millions of
 lines of code have been developed to support scientific research. Although
 an increasingly important part of almost all research projects, most
 research software is barely fit for purpose compared to equivalent systems
 in the commercial world. The code is hard to understand or maintain,
 lacking documentation and version control, and is continually
 ‘re-invented’ as the code writers move on to new jobs. This represents a
 tremendous waste of the already inadequate resources that are put into its
 development. We will investigate how this situation has come about, why it
 is important to the future of research, and what can be done about it. 
 
 Robert McGreevy will draw on his extensive experience at the STFC ISIS
 Facility, and explain how these issues are being addressed for the benefit
 of research science globally. Nicholas Draper, consultant at Tessella,
 will then expand on this, using the example of the Mantid project at ISIS. 
 
 
 Tessella (www.tessella.com) is a technology and consultancy firm, based in
 Oxford.
 
 ISIS (International Species Information System) (www.isis.org) has as its
 mission the facilitation of international collaboration in the collection
 and sharing of knowledge on animals and their environments for zoos,
 aquariums and related organizationsvalues the use of objective data to
 benefit conservation, science, animal welfare, education, and collection
 management.
 
 John Maindonald email: john.maindon...@anu.edu.au
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.
 http://www.maths.anu.edu.au/~johnm
 

I personally feel that a lot of this is a result of failing to publish the
code that was developed to perform research along with the results of the
research.  When setting out to do start a new project, one can dig up tons
of journal articles that will happily inform how data was gathered, what
equations were used and wrap it all up with nicely formatted tables and
graphs that show X is correlated to Y.

What these articles fail to report is the code that was developed to filter
and process the raw data and then apply the equations to produce the figures
and tables.  The next generation of researchers that are seeking to extend
the results then end up writing their own code rather than building upon
what has already been done.

The R community has done a tremendous job in encouraging truly reproducible
research through the package system and tools like Sweave which provide a
means to combine and maintain data, code and reports-- but we need more.

In my opinion, we need to start seeing websites that provide services
similar to github or bitbucket-- but with a focus on scientific research.  I
should be able to set up a versioned repository somewhere in the cloud for
my research projects that hosts not only my code, but my data and reports. 
I could then choose to make this resource publicly available and other
researchers could fork my work with a single mouse click and start
collaborating on my project or extend what I've done into a project of their
own.

And that's my two cents on the state of software in research.

-Charlie
-- 
View this message in context: 
http://n4.nabble.com/Why-software-fails-in-scientific-research-tp1573062p1573068.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.