On 03/11/2014 02:52 PM, Hervé Pagès wrote:
On 03/11/2014 09:57 AM, Valerie Obenchain wrote:
Hi Herve,

On 03/10/2014 10:31 PM, Hervé Pagès wrote:
Hi Val,

I think it would help understand the motivations behind this proposal
if you could give an example of a method where the user cannot supply
a file name but has to create a 'File' (or 'FileList') object first.
And how the file registry proposal below would help.
It looks like you have such an example in the GenomicFileViews package.
Do you think you could give more details?

The most recent motivating use case was in creating subclasses of
GenomicFileViews objects (BamFileViews, BigWigFileViews, etc.) We wanted
to have a general constructor, something like GenomicFileViews(), that
would create the appropriate subclass. However to create the correct
subclass we needed to know if the files were bam, bw, fasta etc.
Recognition of the file type by extension would allow us to do this with
no further input from the user.

That helps, thanks!

Having this kind of general constructor sounds like it could indeed be
useful. Would be an opportunity to put all these *File classes (the 22
RTLFile subclasses defined in rtracklayer and the 5 RsamtoolsFile
subclasses defined in Rsamtools) under the same umbrella (i.e. a parent
virtual class) and use the name of this virtual class (e.g. File) for
the general constructor.

Allowing a registration mechanism to extend the knowledge of this File()
constructor is an implementation detail. I don't see a lot of benefit to
it. Only a package that implements a concrete File subclass would
actually need to register the new subclass. Sounds easy enough to ask
to whoever has commit access to the File() code to modify it. This
kind of update might also require adding the name of the package where
the new File subclass is implemented to the Depends/Imports/Suggests
of the package where File() lives, which is something that cannot be
done via a registration mechanism.

This clean-up of the *File jungle would also be a good opportunity to:

  - Choose what we want to do with reference classes: use them for all
    the *File classes or for none of them. (Right now, those defined
    in Rsamtools are reference classes, and those defined in
    rtracklayer are not.)

  - Move the I/O functionality currently in rtracklayer to a
    separate package. Based on the number of contributed packages I
    reviewed so far that were trying to reinvent the wheel because
    they had no idea that the I/O function they needed was actually
    in rtracklayer, I'd like to advocate for using a package name
    that makes it very clear that it's all about I/O.

H.



H.



Val


Thanks,
H.


On 03/10/2014 08:46 PM, Valerie Obenchain wrote:
Hi all,

I'm soliciting feedback on the idea of a general file 'registry' that
would identify file types by their extensions. This is similar in
spirit
to FileForformat() in rtracklayer but a more general abstraction that
could be used across packages. The goal is to allow a user to supply
only file name(s) to a method instead of first creating a 'File' class
such as BamFile, FaFile, BigWigFile etc.

A first attempt at this is in the GenomicFileViews package
(https://github.com/Bioconductor/GenomicFileViews). A registry (lookup)
is created as an environment at load time:

.fileTypeRegistry <- new.env(parent=emptyenv()

Files are registered with an information triplet consisting of class,
package and regular expression to identify the extension. In
GenomicFileViews we register FaFileList, BamFileList and BigWigFileList
but any 'File' class can be registered that has a constructor of the
same name.

.onLoad <- function(libname, pkgname)
{
     registerFileType("FaFileList", "Rsamtools", "\\.fa$")
     registerFileType("FaFileList", "Rsamtools", "\\.fasta$")
     registerFileType("BamFileList", "Rsamtools", "\\.bam$")
     registerFileType("BigWigFileList", "rtracklayer", "\\.bw$")
}

The makeFileType() helper creates the appropriate class. This function
is used behind the scenes to do the lookup and coerce to the correct
'File' class.

 > makeFileType(c("foo.bam", "bar.bam"))
BamFileList of length 2
names(2): foo.bam bar.bam

New types can be added at any time with registerFileType():

registerFileType(NewClass, NewPackage, "\\.NewExtension$")


Thoughts:

(1) If this sounds generally useful where should it live? rtracklayer,
GenomicFileViews or other? Alternatively it could be its own
lightweight
package (FileRegister) that creates the registry and provides the
helpers. It would be up to the package authors that depend on
FileRegister to register their own files types at load time.

(2) To avoid potential ambiguities maybe searching should be by regex
and package name. Still a work in progress.


Valerie

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to