---------------------------------------- > Date: Thu, 2 Apr 2009 16:41:51 +0100 > From: [email protected] > To: [email protected] > Subject: Re: [BiO BB] Efficient way to retrieve full length cDNA sequences > from GenBank? > > Hi > > I would do it programmatically. You do not even need to know much of PERL to > create your own simple scripts and the ENSEMBL APIs. >
I was using bash scripts with various things ( sed/awk) to parse blast output on short probe queries and then using wget or curl to request genome sequence near the hits ( alt, you can just download the complete genomes locally and use your favorite random access facility, perl would work, to get pieces you want). IIRC, I then used my own c++ code for various tests. For unrelated work on splicing, many arguable splicing cues could be formulated as regular expressions with reverse-complement matches. You can also set up your own local blast DB or get other patterns or rules against which to search. Not sure if there are canned tools but it isn't hard to do a lot of this locally once you get coarse hits for marginal candidates. > > Go to http://www.ensembl.org and look for the APIs in the Docs & FAQ's > section. > It is full of instructions and examples. > > Good luck > Pedro > > -- > Pedro Fernandes > Centro Português de Bioinformática > Quoting dale richardson : > >> >> So my question is this: >> >> What is the most efficient way to obtain a set of cDNA sequences that >> match to a set of genomic DNA sequences while excluding spurious >> hits , RefSeq sequences and "pseudo" full length cDNAs? >> >> As you can imagine, I am interesting in looking for alternative splice >> variants for a number of genes. _________________________________________________________________ Rediscover Hotmail®: Get quick friend updates right in your inbox. http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates1_042009 _______________________________________________ BBB mailing list [email protected] http://www.bioinformatics.org/mailman/listinfo/bbb
