Dear All, I wanted to propose an analytical tool in BioJava.
For e.g.) if we have a large datasets with complete pathway information and the related information(e.g. p53 pathway will have all the genes,proteins,miRNA s involved,etc ) mentioned, could we find the location of a specific unknown (and just predicted protein) protein/gene on a predicted pathway. This was a suggestion on the possible t ings on the analytical side that we could do.Could we think of doing something of this sort for BioJava (or atleast make it capable to handle such aspects) Any ideas / comments are most welcome... Regards, Jitesh Dundas On 4/17/10, jitesh dundas <[email protected]> wrote: > Hi Everyone, > > I went throug the URLs sent by Dr Chapman. Interesting work that you > are doing here.:)... > > I was wondering if there is anyone who could consider on these. I > would like to also be a part of the research work being carried out > using Biojava( especially in sequence alignment, miRNA signature > Analysis (especially for cancers)...) > > 1) A set of tools for converting flat data (e.g. sequence strings, > taxononmy strings) into BioJava-like objects (e.g. SymbolLists, > NCBITaxon). These BioJava-like objects could then be used for more > advanced applications. > A set of tools for manipulating the BioJava-like objects. > > 2) Module?: biojava-ws-blast Module?: biojava-ws-biolit > Proposed Module: biojava-j2ee Lead: Mark Schreiber > > - This would probably take the form of SessionBeans and WebServices > that can be deployed to Glassfish/ JBoss etc to provide biological > services for people who want to make client server or SOA apps. > > 3) I also liked what Mr. Gang Wu is working on(I read the > discussions). I was wondering if I could > do something of that sort... > > May I request the leads to tell me how I could chip in... > > Regards, > Jitesh Dundas > > > > On 4/16/10, Mark Chapman <[email protected]> wrote: >> A great place to start finding ideas is the wiki. >> Both http://biojava.org/wiki/BioJava:Modules >> and http://biojava.org/wiki/BioJava3_Proposal >> list the next steps planned/desired for BioJava. >> >> What research area did you have in mind? >> >> Have fun, >> Mark >> >> >> On 4/16/2010 8:57 AM, jitesh dundas wrote: >>> Dear Sir, >>> >>> I am very interested in contributing to this project. >>> >>> I am looking for a good problem,more on the research side. I can also >>> help in coding (I also work as a software >>> engineer-j2ee/eclipse/jboss/tomcat .. >>> >>> Anything that I could work on... >>> >>> Regards, >>> Jitesh Dundas >>> >>> On 4/8/10, Andreas Dräger<[email protected]> wrote: >>>> Hi all, >>>> >>>> This e-mail is just for your information about somebody new, who'd like >>>> to contribute to our project. >>>> >>>> Cheers >>>> Andreas >>>> >>>> >>>> Subject: >>>> Re: Fwd: Proposing a project on "Biojava alignment lead" >>>> From: >>>> Andreas Dräger<[email protected]> >>>> Date: >>>> Wed, 07 Apr 2010 09:27:13 +0200 >>>> To: >>>> Cai Shaojiang<[email protected]> >>>> >>>> Hi Cai Shaojiang, >>>> >>>> Thank you for you e-mail! I don't know what happened to the e-mail >>>> list. >>>> Sometimes it takes a while due to the spam filters, I guess. >>>> >>>> > I am a PhD student from National University of Singapore. My major >>>> research area is local alignment algorithms and data structures for SNP >>>> identification. And I have used Java and Eclipse for years for software >>>> development. I am very interested in your GSoC programme. I find that >>>> there is a module called "biojava-alignment lead" whose mentor is you. >>>> I >>>> want to propose a new project on this module. I have several questions >>>> about this module. >>>> >>>> Yes, that's me. So great to get your support. >>>> >>>> > 1. It seems that pairwise alignment is to find similarity between >>>> two >>>> short sequences. Existing pairwise alignment is based on dynamic >>>> programming, is it Smith-Waterman algorithm? >>>> >>>> So, currently, BioJava contains three different alignment approaches. >>>> There are two deterministic algorithms, i.e., Smith-Waterman for local >>>> alignment and Needleman-Wunsch for global alignment. Third, there is >>>> the >>>> possibility to apply Hidden Markov Models for alignment. An example of >>>> the latter approach should be in the cookbook. >>>> >>>> > 2. What is the exact task of "refactoring of underlying data >>>> structures"? >>>> >>>> Yes, this is something, I did last week already but it could still be >>>> improved. The problem was that the alignment algorithms actually >>>> produced a kind of string that looks similar to the output of BLAST. >>>> This string contained the score, the computation time, the length of >>>> the >>>> alignment etc. The problem was that people wanted to perform >>>> higher-level computation on the score value or evaluate some other >>>> information. Now, the alignment will produce a data structure that >>>> contains all the information and can, in addition to that, also produce >>>> such a BLAST-like output. There is, however, still the following >>>> problem: The data structure requires both sequences in the pair-wise >>>> alignment to have an identical length. In case of local alignment this >>>> is especially stupid (actually), because gaps are inserted to fill the >>>> sequences. And then the data structure tries to keep the old sequence >>>> coordinates, leading to the effect that the numbers "query start", >>>> "query end", "subject start", and "subject end" are required to shift >>>> the sequences against each other when displaying the output. So, you >>>> cannot easily print the sequences below of each other, you first have >>>> to >>>> shift them. Please check out the latest version of this package via >>>> anonymeous svn and have a look ;-) >>>> >>>> > 3. My existing research area is aiming to deal with aligning short >>>> read (10s~100s bp) against extremely long sequences (e.g., human >>>> genome). Af far as I know, there is not existing such alignment tools >>>> implemented in Java. Would you consider this direction? >>>> >>>> See, this would be very nice to include. But this requires that we no >>>> longer fill the short sequence with many, many gap symbols (just a >>>> waist >>>> of memory), but improve the data structure. There is already an >>>> UnequalLenghtAlignment (just a data structure, no algorithm) and I >>>> think >>>> we could use this as a starting point. Then your algorithm should only >>>> produce such a data structure and this would be fine. >>>> >>>> > 4. It seems that the existing tools is just lacking of some >>>> refactoring and representation interfaces. Any more underlying tasks? >>>> >>>> Hm. Yes: With the release of BioJava 3 data structures have changed >>>> again. So maybe there's also some adaptation to the new structure >>>> required. >>>> >>>> > I am keeping an eye on GSoC from last month, but sorry to find out >>>> that I sent the initial email to the mailing list before I subscribe >>>> it... >>>> >>>> Ok. Sounds good. Thanks for your interest. So I suggest: Download the >>>> latest trunk, have a look, play around and if you can improve something >>>> we'll put it into the trunk and write your name into the authors' tag. >>>> >>>> Cheers >>>> Andreas >>>> >>>> -- >>>> Dipl.-Bioinform. Andreas Dräger >>>> Eberhard Karls University Tübingen >>>> Center for Bioinformatics (ZBIT) >>>> Sand 1 >>>> 72076 Tübingen >>>> Germany >>>> >>>> Phone: +49-7071-29-70436 >>>> Fax: +49-7071-29-5091 >>>> _______________________________________________ >>>> Biojava-l mailing list - [email protected] >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - [email protected] >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
