In contentmine.org we are close to being able to extract Open facts from
the literature on a daily basis.  We intend to do all journals, including
closed access, but have started with Open to develop the technology. To
test this we are searching EuropePubMedCentral for facts such as sequences,
genes, species, etc. See https://www.youtube.com/watch?v=5lYzOZ2Cv_I for a
5 minute video about Zika.

IT's relatively straightforward to add chemical to this - we have OSCAR and
ChemicalTagger which can analyze chunks of running text. And there are
 tools for extraction from diagrams. All facts would, of course be Open CC0.

What sort of things would we like to see extracted from the literature? and
in what context? possible examples are:
- drugs in a disease context (though diseases are not always well defined
- phytochemistry (this should be both straightforward (se can already
extract plant species)
- environment (probably a lookup list)
- chemical syntheses (cf ChemicalTagger http://chemicaltagger.ch.cam.ac.uk)

In that context I'd be very interested in simple lists of chemicals that we
can use for searching:
- INNs for drugs
- ChEBI entries (Pubchem and Chemspider are too large)
- pesticides and herbicides
- environmental chemicals (VOCs, etc.)

And wherever possible we want to be able to link them back to Wikidata
identifiers. So another list is:
- chemicals in Wikipedia

P.


-- 

Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to