In general, you just need to denorm the data and create a list of
Genes, and add each Genes' related information by SQLs. Ranking can be
easily adjusted via each field's weight, not a big deal.

Seems an ideal case for using DBSight. It can also do incremental
indexing, which you may also need.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request)
got 2.6 Million Euro funding!


On Jan 24, 2008 5:42 AM,  <[EMAIL PROTECTED]> wrote:
> Hi,
>
> (Warning, not for the weak-hearted)
>
> I'm currently working on a project where we have a large and complex data
> model, related to Genomics. We are trying to build a search engine that
> provides "full text" and "field-based text" searches for our customer base
> (mostly academic research), and are evaluating different tools for this
> purpose.
>
> As a starting point, we have, as an example, a set of objects (stored in
> tables as a relational model):
> Gene [ID, Symbol, Description]
> Article - M:M with Gene [ID, Title]
> Disease - M:M with Gene [ID, Name]
> Author - M:M with Article [ID, Name]
> (Note: M:M tables exist, just link IDs)
>
> An example model would be (hierarchical, relations dealt with as
> duplications)
>
>   Gene [ID=1, Symbol=EGFR, Description=epidermal growth factor receptor]
>     Article [ID=1, Title=EGFR mutations in lung cancer: correlation with
> clinical response to gefitinib therapy]
>       Author [ID=1, Name=H. Michaelson]
>       Author [ID=2, Name=J. Watson]
>     Article [ID=2, Title=Proteomics analysis of epidermal protein kinases
> by target class-selective prefractionation and tandem mass
> spectrometry]
>       Author [ID=1, Name=H. Michaelson]
>       Author [ID=3, Name=M. Roberts]
>     Disease [ID=1, Name=Epidermal sluffing]
>
>   Gene [ID=2, Symbol=AHCY, Description=S-adenosylhomocysteine hydrolase]
>     Article [ID=3, Title=Limited proteolysis of S-adenosylhomocysteine
> hydrolase: implications for the three-dimensional structure]
>       Author [ID=4, Name=B. Cohen]
>       Author [ID=5, Name=L. Alexander]
>     Article [ID=2, Title=Proteomics analysis of epidermal protein kinases
> by target class-selective prefractionation and tandem mass
> spectrometry]
>       Author [ID=1, Name=H. Michaelson]
>       Author [ID=3, Name=M. Roberts]
>
> Note IDs in the objects above, as they relay the relations in the
> hierarchical model.
>
> In our Full-Text search, we would like to allow users to search ANY
> textual field for any string. For instance, the term "epidermal", and
> display the list of genes which have any data associated with them with
> that term (ranked, of course).
> Our list of results would be something like:
>
> EGFR
>   Found in Description (epidermal growth factor receptor)
>   Found in Article ID#2, in Title (proteomics analysis of epidermal
> protein kinases by target class-selective prefractionation and tandem
> mass spectrometry)
>   Found in Disease ID#1, in Name (Epidermal sluffing)
>
> AHCY
>   Found in Article ID#2, in Title (proteomics analysis of epidermal
> protein kinases by target class-selective prefractionation and tandem
> mass spectrometry)
>
> Note that the results retain a hierarchial view of our Genes (us being
> Gene-Centric, we're pretty much framing the question "find this term
> related in information related to those genes"). Also note that Article ID
> #2 has an M:M with Gene ID2 (AHCY) and Gene ID1 (EGFR), and only due to
> that fact, AHCY is considered a gene that has "epidermal" in its
> annotations.
>
> Obviously, we'd like to rank fields by location in hierarchy (A term in a
> gene name is scored higher than the name of the author of an article
> related to a gene) and by number of hits (number of times a term is found
> related to that gene, 3 in the case of EGFR above).
>
> Ideas for how to take on this challenge? Implementation? Tools?
>
> Thanks!
> Yaron Golan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to