On Tue, Mar 11, 2014 at 3:33 PM, Hervé Pagès <hpa...@fhcrc.org> wrote:
> On 03/11/2014 02:52 PM, Hervé Pagès wrote: > >> On 03/11/2014 09:57 AM, Valerie Obenchain wrote: >> >>> Hi Herve, >>> >>> On 03/10/2014 10:31 PM, Hervé Pagès wrote: >>> >>>> Hi Val, >>>> >>>> I think it would help understand the motivations behind this proposal >>>> if you could give an example of a method where the user cannot supply >>>> a file name but has to create a 'File' (or 'FileList') object first. >>>> And how the file registry proposal below would help. >>>> It looks like you have such an example in the GenomicFileViews package. >>>> Do you think you could give more details? >>>> >>> >>> The most recent motivating use case was in creating subclasses of >>> GenomicFileViews objects (BamFileViews, BigWigFileViews, etc.) We wanted >>> to have a general constructor, something like GenomicFileViews(), that >>> would create the appropriate subclass. However to create the correct >>> subclass we needed to know if the files were bam, bw, fasta etc. >>> Recognition of the file type by extension would allow us to do this with >>> no further input from the user. >>> >> >> That helps, thanks! >> >> Having this kind of general constructor sounds like it could indeed be >> useful. Would be an opportunity to put all these *File classes (the 22 >> RTLFile subclasses defined in rtracklayer and the 5 RsamtoolsFile >> subclasses defined in Rsamtools) under the same umbrella (i.e. a parent >> virtual class) and use the name of this virtual class (e.g. File) for >> the general constructor. >> >> Allowing a registration mechanism to extend the knowledge of this File() >> constructor is an implementation detail. I don't see a lot of benefit to >> it. Only a package that implements a concrete File subclass would >> actually need to register the new subclass. Sounds easy enough to ask >> to whoever has commit access to the File() code to modify it. This >> kind of update might also require adding the name of the package where >> the new File subclass is implemented to the Depends/Imports/Suggests >> of the package where File() lives, which is something that cannot be >> done via a registration mechanism. >> > > This clean-up of the *File jungle would also be a good opportunity to: > > - Choose what we want to do with reference classes: use them for all > the *File classes or for none of them. (Right now, those defined > in Rsamtools are reference classes, and those defined in > rtracklayer are not.) > > - Move the I/O functionality currently in rtracklayer to a > separate package. Based on the number of contributed packages I > reviewed so far that were trying to reinvent the wheel because > they had no idea that the I/O function they needed was actually > in rtracklayer, I'd like to advocate for using a package name > that makes it very clear that it's all about I/O. > > I can see some benefit in renaming/reorganizing, but if they weren't able to perform a simple google search for functionality, I don't think the name of the package was the problem. "read gff bioconductor" returns rtracklayer as the top hit. > > H. > > > >> H. >> >> >> >>> Val >>> >>> >>>> Thanks, >>>> H. >>>> >>>> >>>> On 03/10/2014 08:46 PM, Valerie Obenchain wrote: >>>> >>>>> Hi all, >>>>> >>>>> I'm soliciting feedback on the idea of a general file 'registry' that >>>>> would identify file types by their extensions. This is similar in >>>>> spirit >>>>> to FileForformat() in rtracklayer but a more general abstraction that >>>>> could be used across packages. The goal is to allow a user to supply >>>>> only file name(s) to a method instead of first creating a 'File' class >>>>> such as BamFile, FaFile, BigWigFile etc. >>>>> >>>>> A first attempt at this is in the GenomicFileViews package >>>>> (https://github.com/Bioconductor/GenomicFileViews). A registry >>>>> (lookup) >>>>> is created as an environment at load time: >>>>> >>>>> .fileTypeRegistry <- new.env(parent=emptyenv() >>>>> >>>>> Files are registered with an information triplet consisting of class, >>>>> package and regular expression to identify the extension. In >>>>> GenomicFileViews we register FaFileList, BamFileList and BigWigFileList >>>>> but any 'File' class can be registered that has a constructor of the >>>>> same name. >>>>> >>>>> .onLoad <- function(libname, pkgname) >>>>> { >>>>> registerFileType("FaFileList", "Rsamtools", "\\.fa$") >>>>> registerFileType("FaFileList", "Rsamtools", "\\.fasta$") >>>>> registerFileType("BamFileList", "Rsamtools", "\\.bam$") >>>>> registerFileType("BigWigFileList", "rtracklayer", "\\.bw$") >>>>> } >>>>> >>>>> The makeFileType() helper creates the appropriate class. This function >>>>> is used behind the scenes to do the lookup and coerce to the correct >>>>> 'File' class. >>>>> >>>>> > makeFileType(c("foo.bam", "bar.bam")) >>>>> BamFileList of length 2 >>>>> names(2): foo.bam bar.bam >>>>> >>>>> New types can be added at any time with registerFileType(): >>>>> >>>>> registerFileType(NewClass, NewPackage, "\\.NewExtension$") >>>>> >>>>> >>>>> Thoughts: >>>>> >>>>> (1) If this sounds generally useful where should it live? rtracklayer, >>>>> GenomicFileViews or other? Alternatively it could be its own >>>>> lightweight >>>>> package (FileRegister) that creates the registry and provides the >>>>> helpers. It would be up to the package authors that depend on >>>>> FileRegister to register their own files types at load time. >>>>> >>>>> (2) To avoid potential ambiguities maybe searching should be by regex >>>>> and package name. Still a work in progress. >>>>> >>>>> >>>>> Valerie >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>> >>>> >>> >>> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel