Hi all,

I'm soliciting feedback on the idea of a general file 'registry' that would identify file types by their extensions. This is similar in spirit to FileForformat() in rtracklayer but a more general abstraction that could be used across packages. The goal is to allow a user to supply only file name(s) to a method instead of first creating a 'File' class such as BamFile, FaFile, BigWigFile etc.

A first attempt at this is in the GenomicFileViews package (https://github.com/Bioconductor/GenomicFileViews). A registry (lookup) is created as an environment at load time:

.fileTypeRegistry <- new.env(parent=emptyenv()

Files are registered with an information triplet consisting of class, package and regular expression to identify the extension. In GenomicFileViews we register FaFileList, BamFileList and BigWigFileList but any 'File' class can be registered that has a constructor of the same name.

.onLoad <- function(libname, pkgname)
{
    registerFileType("FaFileList", "Rsamtools", "\\.fa$")
    registerFileType("FaFileList", "Rsamtools", "\\.fasta$")
    registerFileType("BamFileList", "Rsamtools", "\\.bam$")
    registerFileType("BigWigFileList", "rtracklayer", "\\.bw$")
}

The makeFileType() helper creates the appropriate class. This function is used behind the scenes to do the lookup and coerce to the correct 'File' class.

> makeFileType(c("foo.bam", "bar.bam"))
BamFileList of length 2
names(2): foo.bam bar.bam

New types can be added at any time with registerFileType():

registerFileType(NewClass, NewPackage, "\\.NewExtension$")


Thoughts:

(1) If this sounds generally useful where should it live? rtracklayer, GenomicFileViews or other? Alternatively it could be its own lightweight package (FileRegister) that creates the registry and provides the helpers. It would be up to the package authors that depend on FileRegister to register their own files types at load time.

(2) To avoid potential ambiguities maybe searching should be by regex and package name. Still a work in progress.


Valerie

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to