Dear all, Following from Peter's email, it may be worth mentioning our joint project CheTA (Chemistry using Text Annotations--http://www.nactem.ac.uk/cheta/).
A demo can be found here: http://www.nactem.ac.uk/software/cheta/ Workflows can be created using U-Compare http://u-compare.org/ which contains a repository of NLP based text mining technology http://u-compare.org/components/index.html. I hope this is of interest to some of you Best wishes, Sophia On 8 Dec 2010, at 10:34, Peter Murray-Rust wrote: > I'd like to present two of our projects (Quixote > (http://quixote.wikispot.org/Front_Page) and GreenChainReaction > (http://scienceonlinelondon.wikidot.com/topics:green-chain-reaction)), both > of which are aimed at creating semantically enriched data objects in physical > science. (I think there are important and valuable technical issues between > how physical scientists think about data and semantics from - say - > bio/medical science). > > Both are bottom-up projects in that they involve web-based contributors > without an overarching coordinating body. They are open science (all the work > is completely available on the Net as soon as it is published). They also > build their semantics "bottom-up" - i.e. look to see what "discourse" is used > in the domain and try to formalize this. There are probably about 30 people > involved (and theye will be more by January 17th) so it doesn't make sense to > give an author list - but the projects themselves will of course list > contributors. > > These projects are disruptive technology in the same sense that Wikipedia or > Wikileaks are disruptive. (Clay Shirky was lamenting on UK TV 2010-12-07 that > the reaction to WL was via extra-legal methods). I don't want to re-enter my > polemics but it is factually correct that the established organizations in > physical science (most publishers, most learned socs, some univs, some > funders) are indifferent or antagonistic. If BTPDF ignores this then its > results can only be cosmetic. I believe that its factually true to say that > text-mining is currently crippled by the lack of access to freely available > and Open scientific content and must be redressed. I have tried to engage > with 3-4 major (closed) publishers of chemistry over 5 years and the only > thing I have achived is a small corpus for testing purposes under CC-NC from > one. One hasn't bothered to reply. Therefore chemistry will either remain a > semantic desert or there will be a bottom-up revolution. > > So far I seem to be the only one addressing item 4 (IPR). > > On the more positive side we will succeed in our bottom-up projects to create > semantics and ontologies for chemical objects and discourse. in > GreenChainReaction we analysed ca 10,000 patents from the EPO and carried out > semantically based text mining at a medium depth level (i.e. entity > recognition, phrase recognition and default tree-banking). This showed that a > deeper level of NLP gives much better precsion over textual entity > recognition (which is often too imprecise to be useful). We shall be > re-running this exercise and present the results at BTPDF where we shall be > using USPTO patents to create about 200-500,000 reactions in complete > semantic form. This will - we believe - have advatanges over the current > commercial extraction of chemistry into reaction databases - unfortunately > publishers forbid us to apply the technology to research articles and publish > the results. So GCR builds up a resource of all objects published in chemical > reactions and this should allow us to create a complete discourse ontology of > reactions. (BTW anyone interested in text-mining will be welcome to take > part). > > GCR is an after-the-fact markup although the technology could - in principle > - be used in the authoring process. It's a question of communal will, not > technology. > > Quixote represents semantics-at-source and marks up the output of > computational chemistry calculations. It's common to publish "articles" which > just describe calculations, though it's also common to find them as support > for experimental work. Almost invariably the detailed results are never > published though it's trivial to do so and the space is not a problem. > > the reason for this problem is purely cultural and commercial. Most > calculations are carried out by closed source for-money programs and there is > an implicit policy of non-interoperability at the syntax, semantic and > ontological level. The companies compete at least partially through lockin > and inertia which means there is no incentive to create an ontology. > > Quixote believes that there *is* an underlying stable ontology and that by > using the common programs, and exposing their results in semantic form > (Chemical Markup Language) we will be able to create a core ontological > abstraction. This is not as ambitious as it seems - the equations and > fundamental physics are universal and stable for about 80 years or more. By > creating this onotology it will be possible to add annotation at the time > data are emitted from the calculation. It means that all calculations (we > guess about 100 million per year or more) will be available to the whole > community as Open data. And again anyone can join in. > > These projects tick boxes 1.1, 1.2, 2.2, 2.3, 2.4 They also show in great > detail two enthusiastic communities working on Use Cases (box 3) > > Please let me know if this needs editing and if not add it to the workshop > papers under > > Bottom-up semantics and ontologies > > Peter Murray-Rust, > members of The Quioxote Project > members of The GreenChainReaction > > > > -- > Peter Murray-Rust > Reader in Molecular Informatics > Unilever Centre, Dep. Of Chemistry > University of Cambridge > CB2 1EW, UK > +44-1223-763069 Professor Sophia Ananiadou, School of Computer Science, Director, National Centre for Text Mining, Manchester Interdisciplinary Biocentre University of Manchester 131 Princess Street, M1 7DN www.nactem.ac.uk sophia.anania...@manchester.ac.uk tel: +44 161 306 3092
------------------------------------------------------------------------------
_______________________________________________ Blueobelisk-discuss mailing list Blueobelisk-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss