What is the size of species genomes you use? Do you have them locally? If genomes size is < RAM on you computer a simple example could be:
1) Merge all your sequences into a single sequence with ~100 'N' chars between them 2) Merge all genomes 3) Find repeats (common hits) between 2 resulted sequences On Thu, Apr 2, 2009 at 11:09 PM, Mike Marchywka <[email protected]>wrote: > > ---------------------------------------- > > Date: Thu, 2 Apr 2009 16:41:51 +0100 > > From: [email protected] > > To: [email protected] > > Subject: Re: [BiO BB] Efficient way to retrieve full length cDNA > sequences from GenBank? > > > > Hi > > > > I would do it programmatically. You do not even need to know much of PERL > to > > create your own simple scripts and the ENSEMBL APIs. > > > > I was using bash scripts with various things ( sed/awk) to parse blast > output > on short probe queries and then using wget or curl to request > genome sequence near the hits ( alt, you can just download > the complete genomes locally and use your favorite random access > facility, perl would work, to get pieces you want). > IIRC, I then used my own c++ code for various tests. > > For unrelated work on splicing, many arguable splicing cues could be > formulated as regular expressions with reverse-complement matches. > You can also set up your own local blast DB or get other patterns > or rules against which to search. Not sure if there are canned > tools but it isn't hard to do a lot of this locally once you > get coarse hits for marginal candidates. > > > > > > > Go to http://www.ensembl.org and look for the APIs in the Docs & FAQ's > section. > > It is full of instructions and examples. > > > > Good luck > > Pedro > > > > -- > > Pedro Fernandes > > Centro Português de Bioinformática > > > Quoting dale richardson : > > > >> > >> So my question is this: > >> > >> What is the most efficient way to obtain a set of cDNA sequences that > >> match to a set of genomic DNA sequences while excluding spurious > >> hits , RefSeq sequences and "pseudo" full length cDNAs? > >> > >> As you can imagine, I am interesting in looking for alternative splice > >> variants for a number of genes. > > > _________________________________________________________________ > Rediscover Hotmail®: Get quick friend updates right in your inbox. > > http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates1_042009 > _______________________________________________ > BBB mailing list > [email protected] > http://www.bioinformatics.org/mailman/listinfo/bbb > -- Mikhail Fursov _______________________________________________ BBB mailing list [email protected] http://www.bioinformatics.org/mailman/listinfo/bbb
