On 10/31/09 4:14 PM, Michael Lawrence wrote:
On Sat, Oct 31, 2009 at 2:15 PM, Martin Morgan<[email protected]> wrote:
Hi Steve --
Steve Lianoglou<[email protected]> writes:
Hi,
On Oct 28, 2009, at 3:44 PM, Martin Morgan wrote:
<snip>
It could be useful, but you'll need to get it into the shape
expected by the
GenomicFeatures functions. With specific regard to transcripts(),
it will
need a 'chrom' column with the chromosome names, 'name' column for
the gene
name, and 'txStart' and 'txEnd' for the start and end of the
transcripts,
using UCSC coordinate conventions.
I'm now thinking that it would be more convenient for the user if
there was
a transcripts method on a RangesList object, which could provide the
necessary information. This could be extracted from the RangedData,
which
rtracklayer can create from your GFF file, with one caveat: if the
GFF file
contains a mixture of features (genes, exons, etc) and relies on the
hierarchical features of GFF, it will take more work to get things
into the
right shape.
The question then is where would this new functionality belong? The
future
of the GenomicFeatures package was a bit uncertain, the last time I
checked.
The intention is that GenomicFeatures will mature to contain these
sorts of data structures and functions, so that it's easy to create or
retrieve transcript or exon information.
</snip>
I've actually been working on a package of my own that does a lot of
this stuff, since I've been working lately with rna-seq data.
It takes an input file similar to what you get from the ucsc table
browser with gene, tx/cds -start/-end, exon, etc. columns and whips up
an annotation package for the genome/annotation source you are using
(info is stored in an SQLite database). You can then query the
database for genes of interest by name, chromosome, or bounded (start/
end) chromosome position for further downstream use.
Usually results are returned to the user as lists of gene objects,
which the user would have to whip into a data.frame format if that's
what's desired.
I'll post some code below to show you what I mean, and see if you all
think it's helpful. I'm not sure that it's ready for public
consumption, but I guess I just wanted to mention that it was my
intention to develop this a bit further in the near future. If it's
useful for other people, all the better.
Just wanted to acknowledge this email. I like how straight-forward
your interface is, and the use of appropriate classes. RTree and the
custom compile is a little problematic;
I'm wondering if the custom compile is really necessary? Is it not possible
for rsqlite to provide that feature by default?
As far as I know, recent versions of RSQLite have the RTree support
enabled. The following, taken from the RTree documentations
(http://www.sqlite.org/rtree.html), seems to work (RSQLite_0.7-3 DBI_0.2-4):
library("RSQLite")
sql = "
CREATE VIRTUAL TABLE demo_index USING rtree(
id, -- Integer primary key
minX, maxX, -- Minimum and maximum X coordinate
minY, maxY -- Minimum and maximum Y coordinate
)
"
db <- dbConnect(SQLite())
dbGetQuery(db, sql)
ins1 <- "
INSERT INTO demo_index VALUES(
1, -- Primary key
-80.7749, -80.7747, -- Longitude range
30.3776, 30.3778 -- Latitude range
)
"
ins2 <- "
INSERT INTO demo_index VALUES(
2,
-81.0, -79.6,
35.0, 36.2
)
"
dbGetQuery(db, ins1)
dbGetQuery(db, ins2)
sqlq <- "SELECT id FROM demo_index WHERE minX>=-81.08 AND maxX<=-80.58
AND minY>=30.00 AND maxY<=35.44"
dbGetQuery(db, sqlq)
In any case, it would be great to continue the conversation.
+ seth
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing