That's really useful Sul,
I wasn't aware of some of these tools.

>
> I am also interested in this work. I haven't tried ChemBERT. I should give
> some a shot and do a little comparison. Would be a good lecture too.
>

That would be very useful.


> I was using molminer a little while ago built on ORSA.
>

==OSRA?
Last time I looked (several years ago) OSRA  had to be compiled or you
could pay for a binary. (That's partly because the compilation wasn't
trivial).

>
> https://github.com/gorgitko/molminer
>

This looks a useful package (haven't used it)

>
>
> and I think I took a divergent path from automated tooling for now.  I'm
> working on this mapping for Cannabis Sativa I haven't figured out how to
> map the relationship to the phenology perhaps by country of origin?
> functional group?
> I did it manually. I use it as a reference index here:
>
>
> https://github.com/Sulstice/global-chem/blob/development/global_chem/global_chem/medicinal_chemistry/cannabinoids/constituents_of_cannabis_sativa.py
>
> I was thinking I could use this list as a master name as indexes in
> searching other papers.
>

Most frequently occurring compounds are now in well maintained repos such
as CHEBI, PubChem, Wikidata, etc. You shouldn't have to create SMILES for
these as you can download them. (Also you have a few proteins - it's not
normally useful to create SMILES for those.


> Let me know any thoughts.
>

There are roughly two approaches:
* supervised - which requires  lists of chemicals, annotated/labelled data,
etc
* unsupervised where we  look for patterns in the data including word
embedding

P.

>
> Cheers,
> -Sul
>
>
> On Thu, Jan 19, 2023 at 6:26 AM Peter Murray-Rust <pm...@cam.ac.uk> wrote:
>
>> What are the current Open Source tools for recognising chemical entities
>> in text? OSCAR still runs but is probably somewhat overtaken by more
>> recent language models. I see that HuggingFace has "ChemBERT" - does anyone
>> have experience?
>>
>> More generally we want to extract triples of the form:
>> <chemical> <relationship> <plant>
>> We plan to do chemicals and plants and then look for relationships. But
>> maybe people have already done this.
>>
>> TIA
>>
>> P.
>>
>> --
>> "I always retain copyright in my papers, and nothing in any contract I
>> sign with any publisher will override that fact. You should do the same".
>>
>> Peter Murray-Rust
>> Reader Emeritus in Molecular Informatics
>> Yusuf Hamied Department of Chemistry
>> University of Cambridge
>> CB2 1EW, UK
>> +44-1223-336432
>> _______________________________________________
>> Blueobelisk-discuss mailing list
>> Blueobelisk-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
>>
>
>
> --
> *Suliman Sharif*
> Ph.D. Candidate Pharmaceutical Sciences | University of Maryland, School
> of Pharmacy
> M.Sc Medicinal Chemistry | University of California, Riverside School of
> Medicine
> B.Sc. Biochemistry | University of Texas at Austin
> sharifsulim...@gmail.com
>


-- 
"I always retain copyright in my papers, and nothing in any contract I sign
with any publisher will override that fact. You should do the same".

Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Yusuf Hamied Department of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-336432
_______________________________________________
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to