Michael Lawrence <[email protected]> writes: > On Tue, Oct 27, 2009 at 3:53 PM, [email protected] < > [email protected]> wrote: > >> Dear bioc-sig-sequencing, >> >> Previously, I tried the following with a UCSC available genome. >> >> genetable<-read.table("celegans_chrIII.txt", header=T, sep="\t") >> > >> promoter<-IRanges(start=genetable$txStart-1000*as.real(genetable$strand=="+"), >> width=1000) >> >> It was suggested I might "check out the GenomicFeatures package, which has >> utilities for working with a data.frame representation of the UCSC genes >> table. For example, the 'transcripts' function will give you a set of >> regions, including the promoters you're trying to generate." >> >> I have a genome, arabidopsis, apparently not available at the UCSC >> database, but rather from TAIR. >> >> For this genome, might the GenomicFeatures pakage be similarly helpful? I >> assume one might start with a file like TAIR9_GFF3_genes.gff from the TAIR >> site? I note it has records for 'gene', 'mRNA', 'CDS', 'exon', perhaps >> others? >> >> > It could be useful, but you'll need to get it into the shape expected by the > GenomicFeatures functions. With specific regard to transcripts(), it will > need a 'chrom' column with the chromosome names, 'name' column for the gene > name, and 'txStart' and 'txEnd' for the start and end of the transcripts, > using UCSC coordinate conventions. > > I'm now thinking that it would be more convenient for the user if there was > a transcripts method on a RangesList object, which could provide the > necessary information. This could be extracted from the RangedData, which > rtracklayer can create from your GFF file, with one caveat: if the GFF file > contains a mixture of features (genes, exons, etc) and relies on the > hierarchical features of GFF, it will take more work to get things into the > right shape. > > The question then is where would this new functionality belong? The future > of the GenomicFeatures package was a bit uncertain, the last time I checked.
The intention is that GenomicFeatures will mature to contain these sorts of data structures and functions, so that it's easy to create or retrieve transcript or exon information. Martin > Michael > > Thanks, >> P. Terry >> [email protected] >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
