On Tue, Mar 11, 2014 at 8:57 PM, Valerie Obenchain <voben...@fhcrc.org>wrote:
> Hi, > > > On 03/11/14 15:33, Hervé Pagès wrote: > >> On 03/11/2014 02:52 PM, Hervé Pagès wrote: >> >>> On 03/11/2014 09:57 AM, Valerie Obenchain wrote: >>> >>>> Hi Herve, >>>> >>>> On 03/10/2014 10:31 PM, Hervé Pagès wrote: >>>> >>>>> Hi Val, >>>>> >>>>> I think it would help understand the motivations behind this proposal >>>>> if you could give an example of a method where the user cannot supply >>>>> a file name but has to create a 'File' (or 'FileList') object first. >>>>> And how the file registry proposal below would help. >>>>> It looks like you have such an example in the GenomicFileViews package. >>>>> Do you think you could give more details? >>>>> >>>> >>>> The most recent motivating use case was in creating subclasses of >>>> GenomicFileViews objects (BamFileViews, BigWigFileViews, etc.) We wanted >>>> to have a general constructor, something like GenomicFileViews(), that >>>> would create the appropriate subclass. However to create the correct >>>> subclass we needed to know if the files were bam, bw, fasta etc. >>>> Recognition of the file type by extension would allow us to do this with >>>> no further input from the user. >>>> >>> >>> That helps, thanks! >>> >>> Having this kind of general constructor sounds like it could indeed be >>> useful. Would be an opportunity to put all these *File classes (the 22 >>> RTLFile subclasses defined in rtracklayer and the 5 RsamtoolsFile >>> subclasses defined in Rsamtools) under the same umbrella (i.e. a parent >>> virtual class) and use the name of this virtual class (e.g. File) for >>> the general constructor. >>> >>> Allowing a registration mechanism to extend the knowledge of this File() >>> constructor is an implementation detail. I don't see a lot of benefit to >>> it. Only a package that implements a concrete File subclass would >>> actually need to register the new subclass. Sounds easy enough to ask >>> to whoever has commit access to the File() code to modify it. This >>> kind of update might also require adding the name of the package where >>> the new File subclass is implemented to the Depends/Imports/Suggests >>> of the package where File() lives, which is something that cannot be >>> done via a registration mechanism. >>> >> >> This clean-up of the *File jungle would also be a good opportunity to: >> >> - Choose what we want to do with reference classes: use them for all >> the *File classes or for none of them. (Right now, those defined >> in Rsamtools are reference classes, and those defined in >> rtracklayer are not.) >> >> - Move the I/O functionality currently in rtracklayer to a >> separate package. Based on the number of contributed packages I >> reviewed so far that were trying to reinvent the wheel because >> they had no idea that the I/O function they needed was actually >> in rtracklayer, I'd like to advocate for using a package name >> that makes it very clear that it's all about I/O. >> > > Thanks for the suggestions. This re-org sounds good to me. As you say, > unifying the *File classes in a single package would make them more visible > to other developers and enforce consistent behavior. > > If you aren't in favor of a registration mechanism for 'discovery' how > should a function with methods for many *File classes (e.g., import()) > handle a character file name? import() uses FileForFormat() to discover the > file type, makes the *File class and dispatches to the appropriate *File > method. The registry was an attempt at generalizing this concept. > > Honestly, FileForFormat works now, and while it could certainly be improved, perhaps there are bigger problems to solve? > What do you think about the use of a registry for Vince's idea of holding > a digest/path reference to large data but not installing it until it's > used? Other ideas of how / where this could be stored? > > I think that's an orthogonal problem, but a more important one. > Val > > > > >> H. >> >> >> >>> H. >>> >>> >>> >>>> Val >>>> >>>> >>>>> Thanks, >>>>> H. >>>>> >>>>> >>>>> On 03/10/2014 08:46 PM, Valerie Obenchain wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I'm soliciting feedback on the idea of a general file 'registry' that >>>>>> would identify file types by their extensions. This is similar in >>>>>> spirit >>>>>> to FileForformat() in rtracklayer but a more general abstraction that >>>>>> could be used across packages. The goal is to allow a user to supply >>>>>> only file name(s) to a method instead of first creating a 'File' class >>>>>> such as BamFile, FaFile, BigWigFile etc. >>>>>> >>>>>> A first attempt at this is in the GenomicFileViews package >>>>>> (https://github.com/Bioconductor/GenomicFileViews). A registry >>>>>> (lookup) >>>>>> is created as an environment at load time: >>>>>> >>>>>> .fileTypeRegistry <- new.env(parent=emptyenv() >>>>>> >>>>>> Files are registered with an information triplet consisting of class, >>>>>> package and regular expression to identify the extension. In >>>>>> GenomicFileViews we register FaFileList, BamFileList and >>>>>> BigWigFileList >>>>>> but any 'File' class can be registered that has a constructor of the >>>>>> same name. >>>>>> >>>>>> .onLoad <- function(libname, pkgname) >>>>>> { >>>>>> registerFileType("FaFileList", "Rsamtools", "\\.fa$") >>>>>> registerFileType("FaFileList", "Rsamtools", "\\.fasta$") >>>>>> registerFileType("BamFileList", "Rsamtools", "\\.bam$") >>>>>> registerFileType("BigWigFileList", "rtracklayer", "\\.bw$") >>>>>> } >>>>>> >>>>>> The makeFileType() helper creates the appropriate class. This function >>>>>> is used behind the scenes to do the lookup and coerce to the correct >>>>>> 'File' class. >>>>>> >>>>>> > makeFileType(c("foo.bam", "bar.bam")) >>>>>> BamFileList of length 2 >>>>>> names(2): foo.bam bar.bam >>>>>> >>>>>> New types can be added at any time with registerFileType(): >>>>>> >>>>>> registerFileType(NewClass, NewPackage, "\\.NewExtension$") >>>>>> >>>>>> >>>>>> Thoughts: >>>>>> >>>>>> (1) If this sounds generally useful where should it live? rtracklayer, >>>>>> GenomicFileViews or other? Alternatively it could be its own >>>>>> lightweight >>>>>> package (FileRegister) that creates the registry and provides the >>>>>> helpers. It would be up to the package authors that depend on >>>>>> FileRegister to register their own files types at load time. >>>>>> >>>>>> (2) To avoid potential ambiguities maybe searching should be by regex >>>>>> and package name. Still a work in progress. >>>>>> >>>>>> >>>>>> Valerie >>>>>> >>>>>> _______________________________________________ >>>>>> Bioc-devel@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>> >>>>> >>>>> >>>> >>>> >>> >> > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel