One more thing -- If you do a homologous recombination function, I would also include an additional mutator function to mimic genetic drift -- it can be sophisticated in allowing mutations vs the codon table and can be distributed by a known function of percent drift/difference, so you can adjust that and not only catch originating sequences by domains but also by drift criteria.
D On Feb 12, 2008 11:46 AM, DT <[EMAIL PROTECTED]> wrote: > By the way, nr is ftp-able from NCBI and is a protein-based database if > you didn't know. > > > On Feb 12, 2008 11:44 AM, DT <[EMAIL PROTECTED]> wrote: > > > > > On Feb 11, 2008 6:56 PM, Theodore H. Smith <[EMAIL PROTECTED]> wrote: > > > > > > > > On 11 Feb 2008, at 22:28, Ryan Golhar wrote: > > > > > > > Why don't you write up a paper describing the algorithm in detail > > > and > > > > submit it to a bioinformatics journal? And, why not make the > > > > executable > > > > available with documentation so that people can download it and try > > > it > > > > out for themselves. > > > > > > > > Do you have any test cases that show it runs faster/better than > > > BLAST? > > > > Describe them and make them available. > > > > > > The first thing I'd need to do is make a good test. I'm not sure what > > > constitutes "a good test", in this case. > > > > > > > > NR ALL VS ALL: This will test speed and somehow test performance. The > > nr database (non-redundant) from NCBI is a good place to start testing as a > > template database. I'd use your algorithm all-against-all in nr. Test > > against BLAST and then use your algorithm for each entry in nr versus all > > of nr, and then compare performance. You can generate a ROC plot for BLAST > > vs your algorithm against a known set of homologs and distant homologs, > > based on a p-value or significance level cutoff. > > > > A real randomization test would be this to test sensitivity and > > specificity: take known sequences in nr -- all or some of them -- and > > scramble them by 'homologous recombination" -- create chimeras of known > > sequences by different randomization criteria -- by domain (criteria based > > on domain annotation) or by individual sequence based on a known > > randomization function, and then test the sensitivity and specificity of > > BLAST vs your algorithm to detect the originating sequences that created the > > chimeras. > > > > You will also need to check the performance of your algorithm against > > nucleotide sequences. There are already test cases in BLAST for > > mouse-vs-human, that would be a good test case. > > > > Deanne Taylor > > > > > > > _______________________________________________ BBB mailing list [email protected] http://www.bioinformatics.org/mailman/listinfo/bbb
