Dear all,

Following from Peter's email, it may be worth mentioning our joint project 
CheTA (Chemistry using Text Annotations--http://www.nactem.ac.uk/cheta/).

A demo can be found here: http://www.nactem.ac.uk/software/cheta/
Workflows can be created using U-Compare http://u-compare.org/ which contains a 
repository of NLP based text mining technology 
http://u-compare.org/components/index.html. 

I hope this is of interest to some of you

Best wishes,

Sophia


On 8 Dec 2010, at 10:34, Peter Murray-Rust wrote:

> I'd like to present two of our projects (Quixote 
> (http://quixote.wikispot.org/Front_Page) and GreenChainReaction 
> (http://scienceonlinelondon.wikidot.com/topics:green-chain-reaction)), both 
> of which are aimed at creating semantically enriched data objects in physical 
> science. (I think there are important and valuable technical issues between 
> how physical scientists think about data and semantics from - say - 
> bio/medical science). 
> 
> Both are bottom-up projects in that they involve web-based contributors 
> without an overarching coordinating body. They are open science (all the work 
> is completely available on the Net as soon as it is published). They also 
> build their semantics "bottom-up" - i.e. look to see what "discourse" is used 
> in the domain and try to formalize this. There are probably about 30 people 
> involved (and theye will be more by January 17th) so it doesn't make sense to 
> give an author list - but the projects themselves will of course list 
> contributors.
> 
> These projects are disruptive technology in the same sense that Wikipedia or 
> Wikileaks are disruptive. (Clay Shirky was lamenting on UK TV 2010-12-07 that 
> the reaction to WL was via extra-legal methods). I don't want to re-enter my 
> polemics but it is factually correct that the established organizations in 
> physical science (most publishers, most learned socs, some univs, some 
> funders) are indifferent or antagonistic. If BTPDF ignores this then its 
> results can only be cosmetic. I believe that its factually true to say that 
> text-mining is currently crippled by the lack of access to freely available 
> and Open scientific content and must be redressed. I have tried to engage 
> with 3-4 major (closed) publishers of chemistry over 5 years and the only 
> thing I have achived is a small corpus for testing purposes under CC-NC from 
> one. One hasn't bothered to reply. Therefore chemistry will either remain a 
> semantic desert or there will be a bottom-up revolution. 
> 
> So far I seem to be the only one addressing item 4 (IPR). 
> 
> On the more positive side we will succeed in our bottom-up projects to create 
> semantics and ontologies for chemical objects and discourse. in 
> GreenChainReaction we analysed ca 10,000 patents from the EPO and carried out 
> semantically based text mining at a medium depth level (i.e. entity 
> recognition, phrase recognition and default tree-banking). This showed that a 
> deeper level of NLP gives much better precsion over textual entity 
> recognition (which is often too imprecise to be useful). We shall be 
> re-running this exercise and present the results at BTPDF where we shall be 
> using USPTO patents to create about 200-500,000 reactions in complete 
> semantic form. This will - we believe - have advatanges over the current 
> commercial extraction of chemistry into reaction databases - unfortunately 
> publishers forbid us to apply the technology to research articles and publish 
> the results. So GCR builds up a resource of all objects published in chemical 
> reactions and this should allow us to create a complete discourse ontology of 
> reactions. (BTW anyone interested in text-mining will be welcome to take 
> part).
> 
> GCR is an after-the-fact markup although the technology could - in principle 
> - be used in the authoring process. It's a question of communal will, not 
> technology.
> 
> Quixote represents semantics-at-source and marks up the output of 
> computational chemistry calculations. It's common to publish "articles" which 
> just describe calculations, though it's also common to find them as support 
> for experimental work. Almost invariably the detailed results are never 
> published though it's trivial to do so and the space is not a problem. 
> 
> the reason for this problem is purely cultural and commercial. Most 
> calculations are carried out by closed source for-money programs and there is 
> an implicit policy of non-interoperability at the syntax, semantic and 
> ontological level. The companies compete at least partially through lockin 
> and inertia which means there is no incentive to create an ontology.
> 
> Quixote believes that there *is* an underlying stable ontology and that by 
> using the common programs, and exposing their results in semantic form 
> (Chemical Markup Language) we will be able to create a core ontological 
> abstraction. This is not as ambitious as it seems - the equations and 
> fundamental physics are universal and stable for about 80 years or more. By 
> creating this onotology it will be possible to add annotation at the time 
> data are emitted from the calculation. It means that all calculations (we 
> guess about 100 million per year or more) will be available to the whole 
> community as Open data. And again anyone can join in.
> 
> These projects tick boxes 1.1, 1.2, 2.2, 2.3, 2.4 They also show in great 
> detail two enthusiastic communities working on Use Cases (box 3)
> 
> Please let me know if this needs editing and if not add it to the workshop 
> papers under
> 
> Bottom-up semantics and ontologies
> 
> Peter Murray-Rust, 
> members of The Quioxote Project 
> members of The GreenChainReaction
> 
> 
> 
> -- 
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069

Professor Sophia Ananiadou, School of Computer Science,
Director, National Centre for Text Mining,
Manchester Interdisciplinary Biocentre
University of Manchester
131 Princess Street, M1 7DN
www.nactem.ac.uk
sophia.anania...@manchester.ac.uk
tel: +44 161 306 3092




------------------------------------------------------------------------------
_______________________________________________
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to