In general, you just need to denorm the data and create a list of Genes, and add each Genes' related information by SQLs. Ranking can be easily adjusted via each field's weight, not a big deal.
Seems an ideal case for using DBSight. It can also do incremental indexing, which you may also need. -- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Jan 24, 2008 5:42 AM, <[EMAIL PROTECTED]> wrote: > Hi, > > (Warning, not for the weak-hearted) > > I'm currently working on a project where we have a large and complex data > model, related to Genomics. We are trying to build a search engine that > provides "full text" and "field-based text" searches for our customer base > (mostly academic research), and are evaluating different tools for this > purpose. > > As a starting point, we have, as an example, a set of objects (stored in > tables as a relational model): > Gene [ID, Symbol, Description] > Article - M:M with Gene [ID, Title] > Disease - M:M with Gene [ID, Name] > Author - M:M with Article [ID, Name] > (Note: M:M tables exist, just link IDs) > > An example model would be (hierarchical, relations dealt with as > duplications) > > Gene [ID=1, Symbol=EGFR, Description=epidermal growth factor receptor] > Article [ID=1, Title=EGFR mutations in lung cancer: correlation with > clinical response to gefitinib therapy] > Author [ID=1, Name=H. Michaelson] > Author [ID=2, Name=J. Watson] > Article [ID=2, Title=Proteomics analysis of epidermal protein kinases > by target class-selective prefractionation and tandem mass > spectrometry] > Author [ID=1, Name=H. Michaelson] > Author [ID=3, Name=M. Roberts] > Disease [ID=1, Name=Epidermal sluffing] > > Gene [ID=2, Symbol=AHCY, Description=S-adenosylhomocysteine hydrolase] > Article [ID=3, Title=Limited proteolysis of S-adenosylhomocysteine > hydrolase: implications for the three-dimensional structure] > Author [ID=4, Name=B. Cohen] > Author [ID=5, Name=L. Alexander] > Article [ID=2, Title=Proteomics analysis of epidermal protein kinases > by target class-selective prefractionation and tandem mass > spectrometry] > Author [ID=1, Name=H. Michaelson] > Author [ID=3, Name=M. Roberts] > > Note IDs in the objects above, as they relay the relations in the > hierarchical model. > > In our Full-Text search, we would like to allow users to search ANY > textual field for any string. For instance, the term "epidermal", and > display the list of genes which have any data associated with them with > that term (ranked, of course). > Our list of results would be something like: > > EGFR > Found in Description (epidermal growth factor receptor) > Found in Article ID#2, in Title (proteomics analysis of epidermal > protein kinases by target class-selective prefractionation and tandem > mass spectrometry) > Found in Disease ID#1, in Name (Epidermal sluffing) > > AHCY > Found in Article ID#2, in Title (proteomics analysis of epidermal > protein kinases by target class-selective prefractionation and tandem > mass spectrometry) > > Note that the results retain a hierarchial view of our Genes (us being > Gene-Centric, we're pretty much framing the question "find this term > related in information related to those genes"). Also note that Article ID > #2 has an M:M with Gene ID2 (AHCY) and Gene ID1 (EGFR), and only due to > that fact, AHCY is considered a gene that has "epidermal" in its > annotations. > > Obviously, we'd like to rank fields by location in hierarchy (A term in a > gene name is scored higher than the name of the author of an article > related to a gene) and by number of hits (number of times a term is found > related to that gene, 3 in the case of EGFR above). > > Ideas for how to take on this challenge? Implementation? Tools? > > Thanks! > Yaron Golan > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]