Dr. Jones, I think the time is ripe for biojava interfaces to peptide-mass fingerprinting search algorithms, and I am glad to see some interest brewing. Let me express my interest, and relate to the list some of the reasons. Based on what I have worked with from your previous JPAT library and the digestion classes in biojava, I am anxious to see what you come up with.
I am currently working on an implementation of the ProFound algorithm, eventually to be used for in house peptide mass fingerprinting. The performance of my current implementation parallels the web-accessible version of ProFound at http://129.85.19.192/profound_bin/WebProFound.exe?FORM=1 in terms of speed and discriminability. I am designing it to allow the user to search over all possible post-translational modifications, with user-defined parameters that reduce the false-positive and false-negative rates which confound modification searching. Additional parameters include error tolerance, pI, protein MW, etc. Because the size of the search space and complexity of the Bayesian calculations, it is written entirely in ANSI C on Solaris, with only a command line interface, and textual output. Eventually I would be interested in collaborating to design interfaces to the algorithm, but I am not sure how robust C/Java interfaces are. I haven't studied the SeqSimilaritySearchResult interface very much, but my first impression is that you may want to create a new interface to encompass the large amount of data returned from a typical search. In the web version of ProFound (as well as the others, like MSFit), the top hits are returned with MW, peptide hits, errors, peptide hit sequences, as well as the normalized probability score. The advantage to interfacing with an algorithm as opposed to a results parser, would be the opportunity to capture more information. The scores from ProFound are displayed as normalized Bayesian probabilities which all add up to 1; however, in many cases, it is also useful to know the likelihood prior to normalization, as well as a number of other functions used in the probability calculation which are very useful in the identification process. Obtaining the likelihood for each hit allows one to compare scores across different searches, as well as doing statistical testing to estimate false-positive rates. All in all, a clean interface to such an algorithm would be immensely useful, not just for doing single searches, but also for automation of thousands at a time, and for automated statistical testing as in: J. Eriksson, B.T. Chait, and D. Fenyö, "A Statistical Basis for Testing the Significance of Mass Spectrometric Protein Identification Results", Analytical Chemistry 72 (2000) 999-1005. I'm planning to publish the work soon, and subsequently would be interested speaking with you about the design of such interfaces. -- Will Old, Ph.D. Research Associate Center for Computational Pharmacology http://compbio.uchsc.edu/ Univ. Colo. Health Sci. Center 303-315-1102 [EMAIL PROTECTED] -----Original Message----- From: Michael Jones [mailto:[EMAIL PROTECTED]] Sent: Wednesday, December 26, 2001 9:36 AM To: [EMAIL PROTECTED] Subject: [Biojava-l] Mass Search Results I am thinking about creating some biojava interfaces and implementations for peptide-mass fingerprint and peptide fragment mass searches of sequence databases. I would like to make it general enough so that it could be used to wrap some of the popular search tools. So I need to abstract out things like Scoring schemes. In general the input would be a set of masses (Protein and peptide or Fragments and Parent peptide), error tolerance and other filters. The output would be a set of proteins or nucleotide sequences along with there associated scores and possibly with the matches as features annotated onto the returned sequences. I have been looking at some of the Interfaces used for FastA searches but I am not sure that they are appropriate for the problem above. For Example the SearchBuilder has as one of its methods SeqSimilaritySearchResult makeSearchResult(). A SeqSimilaritySearchResult has a method getQuerySequence() that is not appropriate for the mass search problem. What do people think. Should I go ahead and use them and just ignore getQuerySequence() or should I create new interfaces? Perhaps I could just extend SeqSimilaritySearchResult and add a getQueryMassSet method or just use the same interface and just put the masses into the SearchParameters Map. Also these interfaces according to the documentation seem to be designed to handle parsing of results but not for algorithm implementations. Is there some other interfaces that may be more appropriate for doing search algorithm implementations? _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l