Re: [R] Why software fails in scientific research
On 07/01/2010 03:29 AM, Dr. David Kirkby wrote: On 03/ 1/10 12:23 AM, Sharpie wrote: John Maindonald wrote: I came across this notice of an upcoming webinar. The issues identified in the first paragraph below seem to me exactly those that the R project is designed to address. The claim that most research software is barely fit for purpose compared to equivalent systems in the commercial world seems to me not quite accurate! Comments! It can be argued that this is a reporting bias. Whenever I inform people doing epidemiology with Excel about Ian Buchan's paper on Excel errors: http://www.nwpho.org.uk/sadb/Poisson%20CI%20in%20spreadsheets.pdf there is a sort of reflexive disbelief, as though something as widely used as Excel could not possibly be wrong. That is to say, most people using commercial software, especially the sort that allows them to follow a cookbook method and get an acceptable (to supervisors, journal editors and paymasters) result simply accept it without question. The counterweight to the carefree programming style employed by many researchers (I include myself) is the multitude of enquiring eyes that find our mistakes, and foster a continual refinement of our programs. I just received one this evening, about yet another thing that I had never considered, perfect agreement by rating methods in a large trial. Thus humanity bootstraps upward. My AUD0.02 Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why software fails in scientific research
On Wed, 2010-06-30 at 11:17 -0700, Bert Gunter wrote: Just one small additional note below ... Bert Gunter Genentech Nonclinical Biostatistics But a lot of academics are not going to waste their time documenting code properly, so others can reap the benefits of it. They would rather get on with the next project, to get the next paper. -- Indeed. My personal experience over 3 decades in industrial (private) research is that data analysis is viewed as relatively unimportant/straightforward/pedestrian and is left to technicians (or postdocs) -- often with what is done being largely dictated by the conventions of a particular journal or discipline. The lab heads and research directors are responsible for the grand research strategies, managing resources, etc. and don't want to waste much time on something that routine. So worrying about reproducibility of data analysis code (if there is any, given the use of GUI software like Excel) falls beneath their radar. Clearly there are disciplines (e.g. ecology?) where this may NOT be the case. If ecology is anything to go by (and I am an ecologist, sort of, just about), there is a large body of the community doing things because i) that is how they've always been done, or ii) because that's what reviewers/editors expect etc. with a much smaller group of researchers pushing at the boundaries (of their field) to use techniques statisticians and the like have been using for a very long time. Reproducible research is still very much in the (very, very) small minority of the work I come across reviewing papers etc. But I am encouraged by the number of people I know who are starting to use tools like R to conduct their research. -- Bert G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why software fails in scientific research
For what its worth! A good friend who also happens to be an ecologist told me An ecologist is a statistician who likes to be outside. Murray M Cooper, Phd Richland Statistics - Original Message - From: Gavin Simpson gavin.simp...@ucl.ac.uk To: Bert Gunter gunter.ber...@gene.com Cc: r-help@r-project.org Sent: Thursday, July 01, 2010 11:57 AM Subject: Re: [R] Why software fails in scientific research On Wed, 2010-06-30 at 11:17 -0700, Bert Gunter wrote: Just one small additional note below ... Bert Gunter Genentech Nonclinical Biostatistics But a lot of academics are not going to waste their time documenting code properly, so others can reap the benefits of it. They would rather get on with the next project, to get the next paper. -- Indeed. My personal experience over 3 decades in industrial (private) research is that data analysis is viewed as relatively unimportant/straightforward/pedestrian and is left to technicians (or postdocs) -- often with what is done being largely dictated by the conventions of a particular journal or discipline. The lab heads and research directors are responsible for the grand research strategies, managing resources, etc. and don't want to waste much time on something that routine. So worrying about reproducibility of data analysis code (if there is any, given the use of GUI software like Excel) falls beneath their radar. Clearly there are disciplines (e.g. ecology?) where this may NOT be the case. If ecology is anything to go by (and I am an ecologist, sort of, just about), there is a large body of the community doing things because i) that is how they've always been done, or ii) because that's what reviewers/editors expect etc. with a much smaller group of researchers pushing at the boundaries (of their field) to use techniques statisticians and the like have been using for a very long time. Reproducible research is still very much in the (very, very) small minority of the work I come across reviewing papers etc. But I am encouraged by the number of people I know who are starting to use tools like R to conduct their research. -- Bert G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why software fails in scientific research
OK… My Grandfather, who was a farmer, was outstanding in his field… Cheers… Murray M Cooper, PhD wrote: For what its worth! A good friend who also happens to be an ecologist told me An ecologist is a statistician who likes to be outside. Murray M Cooper, Phd Richland Statistics - Original Message - From: Gavin Simpson gavin.simp...@ucl.ac.uk To: Bert Gunter gunter.ber...@gene.com Cc: r-help@r-project.org Sent: Thursday, July 01, 2010 11:57 AM Subject: Re: [R] Why software fails in scientific research On Wed, 2010-06-30 at 11:17 -0700, Bert Gunter wrote: Just one small additional note below ... Bert Gunter Genentech Nonclinical Biostatistics But a lot of academics are not going to waste their time documenting code properly, so others can reap the benefits of it. They would rather get on with the next project, to get the next paper. -- Indeed. My personal experience over 3 decades in industrial (private) research is that data analysis is viewed as relatively unimportant/straightforward/pedestrian and is left to technicians (or postdocs) -- often with what is done being largely dictated by the conventions of a particular journal or discipline. The lab heads and research directors are responsible for the grand research strategies, managing resources, etc. and don't want to waste much time on something that routine. So worrying about reproducibility of data analysis code (if there is any, given the use of GUI software like Excel) falls beneath their radar. Clearly there are disciplines (e.g. ecology?) where this may NOT be the case. If ecology is anything to go by (and I am an ecologist, sort of, just about), there is a large body of the community doing things because i) that is how they've always been done, or ii) because that's what reviewers/editors expect etc. with a much smaller group of researchers pushing at the boundaries (of their field) to use techniques statisticians and the like have been using for a very long time. Reproducible research is still very much in the (very, very) small minority of the work I come across reviewing papers etc. But I am encouraged by the number of people I know who are starting to use tools like R to conduct their research. -- Bert G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.ad...@noaa.gov VOICE: 937-383-0528 FAX:937-383-0033 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why software fails in scientific research
Thomas, How popular is R inside of NOAA? On Thu, Jul 1, 2010 at 11:25 AM, Thomas Adams thomas.ad...@noaa.gov wrote: OK My Grandfather, who was a farmer, was outstanding in his field Cheers Murray M Cooper, PhD wrote: For what its worth! A good friend who also happens to be an ecologist told me An ecologist is a statistician who likes to be outside. Murray M Cooper, Phd Richland Statistics - Original Message - From: Gavin Simpson gavin.simp...@ucl.ac.uk To: Bert Gunter gunter.ber...@gene.com Cc: r-help@r-project.org Sent: Thursday, July 01, 2010 11:57 AM Subject: Re: [R] Why software fails in scientific research On Wed, 2010-06-30 at 11:17 -0700, Bert Gunter wrote: Just one small additional note below ... Bert Gunter Genentech Nonclinical Biostatistics But a lot of academics are not going to waste their time documenting code properly, so others can reap the benefits of it. They would rather get on with the next project, to get the next paper. -- Indeed. My personal experience over 3 decades in industrial (private) research is that data analysis is viewed as relatively unimportant/straightforward/pedestrian and is left to technicians (or postdocs) -- often with what is done being largely dictated by the conventions of a particular journal or discipline. The lab heads and research directors are responsible for the grand research strategies, managing resources, etc. and don't want to waste much time on something that routine. So worrying about reproducibility of data analysis code (if there is any, given the use of GUI software like Excel) falls beneath their radar. Clearly there are disciplines (e.g. ecology?) where this may NOT be the case. If ecology is anything to go by (and I am an ecologist, sort of, just about), there is a large body of the community doing things because i) that is how they've always been done, or ii) because that's what reviewers/editors expect etc. with a much smaller group of researchers pushing at the boundaries (of their field) to use techniques statisticians and the like have been using for a very long time. Reproducible research is still very much in the (very, very) small minority of the work I come across reviewing papers etc. But I am encouraged by the number of people I know who are starting to use tools like R to conduct their research. -- Bert G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.ad...@noaa.gov VOICE: 937-383-0528 FAX:937-383-0033 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why software fails in scientific research
On 03/ 1/10 12:23 AM, Sharpie wrote: John Maindonald wrote: I came across this notice of an upcoming webinar. The issues identified in the first paragraph below seem to me exactly those that the R project is designed to address. The claim that most research software is barely fit for purpose compared to equivalent systems in the commercial world seems to me not quite accurate! Comments! There's probably a lot of truth in those comments. Generally speaking, publishing results gets rewards in terms of promotion, salary etc. Having your code well documented, in revision control systems does not. I don't think any amount of I personally feel that a lot of this is a result of failing to publish the code that was developed to perform research along with the results of the research. When setting out to do start a new project, one can dig up tons of journal articles that will happily inform how data was gathered, what equations were used and wrap it all up with nicely formatted tables and graphs that show X is correlated to Y. What these articles fail to report is the code that was developed to filter and process the raw data and then apply the equations to produce the figures and tables. The next generation of researchers that are seeking to extend the results then end up writing their own code rather than building upon what has already been done. But unless code is well documented, its often quicker to start from scratch anyway. The R community has done a tremendous job in encouraging truly reproducible research through the package system and tools like Sweave which provide a means to combine and maintain data, code and reports-- but we need more. In my opinion, we need to start seeing websites that provide services similar to github or bitbucket-- but with a focus on scientific research. I should be able to set up a versioned repository somewhere in the cloud for my research projects that hosts not only my code, but my data and reports. I could then choose to make this resource publicly available and other researchers could fork my work with a single mouse click and start collaborating on my project or extend what I've done into a project of their own. But a lot of academics are not going to waste their time documenting code properly, so others can reap the benefits of it. They would rather get on with the next project, to get the next paper. FTP sites have existed for years. If people want to make their data analysis code available, it is not hard. But I think it would need a change of attitude more than any technical advance. And that's my two cents on the state of software in research. -Charlie And there is my two pennies! Dave __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why software fails in scientific research
Just one small additional note below ... Bert Gunter Genentech Nonclinical Biostatistics But a lot of academics are not going to waste their time documenting code properly, so others can reap the benefits of it. They would rather get on with the next project, to get the next paper. -- Indeed. My personal experience over 3 decades in industrial (private) research is that data analysis is viewed as relatively unimportant/straightforward/pedestrian and is left to technicians (or postdocs) -- often with what is done being largely dictated by the conventions of a particular journal or discipline. The lab heads and research directors are responsible for the grand research strategies, managing resources, etc. and don't want to waste much time on something that routine. So worrying about reproducibility of data analysis code (if there is any, given the use of GUI software like Excel) falls beneath their radar. Clearly there are disciplines (e.g. ecology?) where this may NOT be the case. -- Bert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Why software fails in scientific research
I came across this notice of an upcoming webinar. The issues identified in the first paragraph below seem to me exactly those that the R project is designed to address. The claim that most research software is barely fit for purpose compared to equivalent systems in the commercial world seems to me not quite accurate! Comments! WEBINAR SERIES A Crack in the Code: Why software fails in scientific research, and how to fix it. Thursday, March 25, 2010, 3:00 PM GMT http://physicsworld.com/cws/go/webinar9 In the 60 years since the invention of the digital computer, millions of lines of code have been developed to support scientific research. Although an increasingly important part of almost all research projects, most research software is barely fit for purpose compared to equivalent systems in the commercial world. The code is hard to understand or maintain, lacking documentation and version control, and is continually ‘re-invented’ as the code writers move on to new jobs. This represents a tremendous waste of the already inadequate resources that are put into its development. We will investigate how this situation has come about, why it is important to the future of research, and what can be done about it. Robert McGreevy will draw on his extensive experience at the STFC ISIS Facility, and explain how these issues are being addressed for the benefit of research science globally. Nicholas Draper, consultant at Tessella, will then expand on this, using the example of the Mantid project at ISIS. Tessella (www.tessella.com) is a technology and consultancy firm, based in Oxford. ISIS (International Species Information System) (www.isis.org) has as its mission the facilitation of international collaboration in the collection and sharing of knowledge on animals and their environments for zoos, aquariums and related organizationsvalues the use of objective data to benefit conservation, science, animal welfare, education, and collection management. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why software fails in scientific research
John Maindonald wrote: I came across this notice of an upcoming webinar. The issues identified in the first paragraph below seem to me exactly those that the R project is designed to address. The claim that most research software is barely fit for purpose compared to equivalent systems in the commercial world seems to me not quite accurate! Comments! WEBINAR SERIES A Crack in the Code: Why software fails in scientific research, and how to fix it. Thursday, March 25, 2010, 3:00 PM GMT http://physicsworld.com/cws/go/webinar9 In the 60 years since the invention of the digital computer, millions of lines of code have been developed to support scientific research. Although an increasingly important part of almost all research projects, most research software is barely fit for purpose compared to equivalent systems in the commercial world. The code is hard to understand or maintain, lacking documentation and version control, and is continually ‘re-invented’ as the code writers move on to new jobs. This represents a tremendous waste of the already inadequate resources that are put into its development. We will investigate how this situation has come about, why it is important to the future of research, and what can be done about it. Robert McGreevy will draw on his extensive experience at the STFC ISIS Facility, and explain how these issues are being addressed for the benefit of research science globally. Nicholas Draper, consultant at Tessella, will then expand on this, using the example of the Mantid project at ISIS. Tessella (www.tessella.com) is a technology and consultancy firm, based in Oxford. ISIS (International Species Information System) (www.isis.org) has as its mission the facilitation of international collaboration in the collection and sharing of knowledge on animals and their environments for zoos, aquariums and related organizationsvalues the use of objective data to benefit conservation, science, animal welfare, education, and collection management. John Maindonald email: john.maindon...@anu.edu.au phone : +61 2 (6125)3473fax : +61 2(6125)5549 Centre for Mathematics Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm I personally feel that a lot of this is a result of failing to publish the code that was developed to perform research along with the results of the research. When setting out to do start a new project, one can dig up tons of journal articles that will happily inform how data was gathered, what equations were used and wrap it all up with nicely formatted tables and graphs that show X is correlated to Y. What these articles fail to report is the code that was developed to filter and process the raw data and then apply the equations to produce the figures and tables. The next generation of researchers that are seeking to extend the results then end up writing their own code rather than building upon what has already been done. The R community has done a tremendous job in encouraging truly reproducible research through the package system and tools like Sweave which provide a means to combine and maintain data, code and reports-- but we need more. In my opinion, we need to start seeing websites that provide services similar to github or bitbucket-- but with a focus on scientific research. I should be able to set up a versioned repository somewhere in the cloud for my research projects that hosts not only my code, but my data and reports. I could then choose to make this resource publicly available and other researchers could fork my work with a single mouse click and start collaborating on my project or extend what I've done into a project of their own. And that's my two cents on the state of software in research. -Charlie -- View this message in context: http://n4.nabble.com/Why-software-fails-in-scientific-research-tp1573062p1573068.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.