Or going the positive declarative route but arguably more informative: skip.to.fasta or fasta.only
I don't know the GFF format spec, are we guaranteed that there will be only one embedded fasta file and that it will be contiguous within the file? If not the skip.to._ terminology would not technically be correct. ~G On Mon, Mar 17, 2014 at 8:17 PM, Michael Lawrence <lawrence.mich...@gene.com > wrote: > For direct reading of the sequence, the skip.non.fasta idea sounds good. An > alternative for the name would be "skip.to.first.record". Up to you. > > Michael > > > On Mon, Mar 17, 2014 at 5:33 PM, Hervé Pagès <hpa...@fhcrc.org> wrote: > > > Hi Michael, > > > > > > On 03/17/2014 04:15 PM, Michael Lawrence wrote: > > > >> Hi Herve, > >> > >> What would be a clean way for rtracklayer to extract the (optional) > FASTA > >> data embedded in a GFF3 file and parse it as an XStringSet? Is there a > >> low-level way to pass in-memory data to the parser in Biostrings? > >> > > > > Not that it can be used here, but readDNAStringSet() has the 'skip' arg > > which is analogous to the 'skip' arg of read.table(), except that, in > > the case of readDNAStringSet(), it needs to be specified as the number > > of records (FASTA or FASTQ) to skip before beginning to read in > > records. So the assumption is that everything before the first record > > to read is valid FASTA (or FASTQ). Which is of course not the case > > with those GFF3 files with embedded FASTA data. > > > > However it would be easy to add another arg, say 'skip.non.fasta.lines', > > to automatically skip lines that don't look like the header of a FASTA > > record (i.e. that don't start with '>'). > > > > > > > >> In terms of the API, import,GFFFile could return a GRanges with the > >> DNAStringSet in the metadata(). Or there could be a method for > >> readDNAStringSet on GFF3File that returns the DNAStringSet directly. > >> > > > > The readDNAStringSet,GFF3File method seems cleaner than the metadata() > > solution. It's also lower-level and would be needed behind the scene by > > import,GFFFile, so I think it would make sense to start with it. > > Implementing readDNAStringSet,GFF3File will be trivial once we have > > something like the 'skip.non.fasta' arg. Should I go for it? Any better > > suggestion for the name of this arg? > > > > Thanks, > > H. > > > > > >> It turns out this functionality is useful when working with microbial > >> genomes, where information tends to be passed around as Genbank files. > For > >> right now the easiest path seems to be to convert Genbank to GFF, but a > >> Genbank parser in Bioc could be an eventual goal. It's a very complex > file > >> format. > >> > >> Michael > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> > >> > > -- > > Hervé Pagès > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpa...@fhcrc.org > > Phone: (206) 667-5791 > > Fax: (206) 667-1319 > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > -- Gabriel Becker Graduate Student Statistics Department University of California, Davis [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel