On 06/29/2016 03:42 PM, Thomas Girke wrote:
Yes, a "readSummarizedExperiment" would be a "modern-day analog of
Biobase::readExpressionSet". I also agree with the other suggestions
including github to get this started, and Vince's thoughts on binding
meta-data more tightly to source data as well as improving
interoperability.

I started a repository at

  https://github.com/Bioconductor/TenStepReproducible

I envision this as a package / white paper / eventually publication. feel free to fork etc., and / or to contribute other ideas.

Martin


As suggested I am sharing this discussion with the bioc-devel list.

Thomas

On Wed, Jun 29, 2016 at 06:22:49PM +0000, Vincent Carey wrote:

Thanks Thomas -- I think this should be circulated to biocore for further 
comments.  I am in agreement
that we need to do a better job at both demonstrating the values of a) binding 
metadata to data, b)
using standard containers through workflows, c) allowing interoperation.  I 
learned some useful things
about spreadsheet interoperation at the conference and need to learn more.

In a sense we are giving a specific implementation of some of the rules in

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285

and I wonder whether we could come up with another topic for the "ten simple 
rules"
series that addresses these concerns, or do something similar, perhaps for 
F1000Research,
with a Bioconductor-interoperability focus on metadata.


On Wed, Jun 29, 2016 at 06:28:49PM +0000, Martin Morgan wrote:

I guess you mean a modern-day analog of Biobase::readExpressionSet ? I
like the idea of templates, and also drafting a 'Ten Steps Toward
Reproduciblity in R / Bioconductor'. Would be happy to start a github
repo for same if there are any takers...

Martin

This email message may contain legally privileged and/or confidential
information.  If you are not the intended recipient(s), or the employee
or agent responsible for the delivery of this message to the intended
recipient(s), you are hereby notified that any disclosure, copying,
distribution, or use of this email message is prohibited.  If you have
received this message in error, please notify the sender immediately by
e-mail and delete this email message from your computer. Thank you.


On 06/29/2016 01:57 PM, Thomas Girke wrote:
Hi Vince and Martin,

It was great seeing you at the Bioc conference, and thanks for all your
time organizing the conference. As always it was a great success with a
lot of inspiring presentations and discussions.

In one of our discussions you ask me for feedback why I think handling
of meta-data is currently not straightforward for non-expert users of
Bioc packages such as biologists, data analysts or developers coming
from other languages.

In my opinion, one main reason for this difficulty is that there is no
formal utility provided for importing meta-data from external files
(e.g. tabular, json or other formats). SummarizedExperiments has all
these great functionalities but it is not intuitive to non-expert users
how to import the data into the final object. For a developer it is easy
to write a custom import function but not to non-R programmers.
Addressing this need would be trivial by providing an import function
that could read meta-data (optionally along with assay/range data)
provided by the user directly into SummarizedExperiment objects (and/or
RangedSummarizedExperiment). To the best of my knowledge, a
readSummarizedExperiment is currently not available, but I might be wrong?

Almost equally important would be an export function so that users can
easily report intermediate results and also share them with external
software outside of R. Clearly, for the latter need exporting to an Rd
file is not an option.

Especially the import step overlaps substantially how we communicate
with experimentalists via spreadsheets, a topic we discussed at the
meeting quite a bit. Providing one or two best practice templates of how
to organize experiments in the 'spirit' of SummarizedExperiment could
help to educate scientists how to format their meta-data in Excel or
Google sheets so that they are easier to process. This would also
improve reproducibility since many sample handling errors happen right
at this level. As an example file one could use here the current colData
sample used by the SummarizedExperiment vignette.

That's really all.

Best,

Thomas





This email message may contain legally privileged and/or...{{dropped:2}}

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to