JESS: request for comment: using JESS in bioinformatics

Paul Shannon Thu, 19 Jun 2003 04:56:44 -0700

I hope to get comments from the JESS community on the pros and
cons of using JESS on a problem I face in bioinformatics.


I have used JESS in a few tiny pilot projects, with some success.
But before I do serious development, I'd like to get a critique
from those with greater experience.

Here's the background:  I am involved in the Cytoscape project
(http://www.cytoscape.org), an open source java tool for
exploring molecular networks.  

We often face a data cross-reference problem.  By this I
mean that we work with entities (typically genes, mRNA, or
proteins) for which we have identifiers, and we need to
cross-reference them to other related data.  The related
data often uses a different set of identifiers.  New kinds
of data are appearing all the time.

There are a number of laudable efforts in the biological
community to standardize names, and I use them whenever I
can, but these efforts don't seem to keep pace with the
burgeoning kinds and quantities of biological data we are
interested in.

To make this concrete, here is a recent example:

   1) In a study of prostate cancer, we started with
      microrarry measurments identified by unigene 
      cluster ID of human mRNA fragments

   2) I mapped unigene cluster ID to LocusLink ID for most
      of the mRNA

   3) From LocusLink ID I mapped to HUGO gene symbol, and 
      RefSeq protein identifier

   4) From LocusLink ID, I was also able to get the Enzyme
      Commision (EC) term of the related protein from KEGG.
      (KEGG's EC assignments were better than LocusLink.)

   5) From RefSeq protein ID, I was able to get the amino acid
      sequence

   6) From amino acid sequence, I was able to BLAST against 
      yeast sequences, and determine the yeast orthologs of
      the human genes, which set the stage for inferring the
      possible 'interactome' context of the human genes 

   7) From the EC number, I was able to map the human genes
      onto KEGG metabolic pathways

   8) From the RefSeq protein ID, I was able to get IPI number,
      and thus the latest GeneOntology annotation from the
      GOA project at EBI

This chain of reasoning and cross-referencing needs to be
done for just about every data set I see.  I have developed a bag of
tools (in java and python) which partially automate the process.
But extending, managing, and invoking these tools is not very
easy.  So I am thinking about adding JESS.

I am drawn to JESS because my process seems to consist of the
following steps:

   1) defining rules that perform fairly simple operations

   2) applying the rules as needed

   3) adding new rules all the time, to accomodate the latest
      kind of experimental data and desired cross-reference

   4) caching results to avoid repetitious labor

Is this a good project for JESS?  I will be grateful for
your replies!

Regards,

   Paul Shannon
   Software Engineer
   Institute for Systems Biology
   Seattle

--------------------------------------------------------------------
To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]'
in the BODY of a message to [EMAIL PROTECTED], NOT to the list
(use your own address!) List problems? Notify [EMAIL PROTECTED]
--------------------------------------------------------------------

JESS: request for comment: using JESS in bioinformatics

Reply via email to