The discussion on InChIs raises the question as to who creates and
manages communal resources and metadata. InChIs work as they are
algorithmic but they fail for inorganics (especially mineral
polymorphs) and substances ("glucose", "glutamate"), etc. where
conventional human-assigned identifiers are possible. If the
substances are sufficiently common they will be in Wikipedia and that
should work as an excellent mechanism, and if they are in Pubchem
that is also a possibility. But if they are new - "proposed molecule
X" or in a publication, then we need something else.
A similar problem arises with images for structures. There are
several types, but specifically (a) semantic, often ugly but
machine-compatible and (b) pretty - cf TotallySynth - often with
unclear machine semantics (e.g. perspective). Both are needed - the
MathML community also has this problem.
Wikipedia solves the image problem by providing a repository of
images and allowing multiple link throughs. 2 years ago I thought
about providing an image drawing service for blogs - draw once, use
many. If, say, we had JChempaint mounted on our server anyone could
draw their image for the blog and link to it. The killer was that we
couldn't (a) provide a robust server and (b) demand might kill it.
But now we have unlimited free storage everywhere. So what about:
(a) there are a number of molecule drawing sites. (Obviously we can't
provide ChemDraw, but most others would allow it - Marvin, JME, ACD).
The author would draw an structure and - for organics - get the
InChI. The service would immediately search the BO server space for
the identical InChI (or, excitingly) any InChI related by layers. If
it found other InChIs it could display these to the author, who might
wish to use one with, say, fuller stereochemistry. Or a prettier
version (e.g. for macrocycles). There might even be some clever
language processing - e.g. paste a name from a journal and get the
structure - Peter has been looking into that.
(b) the software generates an image and names it with a unique name.
Probably not the InChI but either Pubchem CID or a BO ID (see below).
The we post it to Flickr or Google or wherever. These sites remain
stable so that authors could link to them from their blog. It would
depend on the blog software how easy it was to download images into
the text - Wordpress seems to do local files but not URLs - unless I
have missed something. But actually cut and paste from remote images
seems to work in many cases.
So I suggest that we might need a BO identifier. It needs to be
nearly unique. (If it collides once every few years the blogosphere
will forgive you). I suspect that an MD5, or a datetime should be OK
and this could be kept relatively short. Or we could simply ask the
servers to assign ids sequentially and use new ones if they collide.
We can't be the first to do this. We aren't running a bank or a
nuclear power station so a problems won't matter.
We'd have to have a bidirectional lookup for this identifier. InChi
<==> BO. That could be done with RDF, and would be a fun exercise. If
we get above 100,000 triples we will have succeeded anyway. There are
people who are offering to host triple services for free. We could
probably put it in our institutional repository for indexing chemistry theses.
Anyway that is a first shot...
P.
Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road, Cambridge CB2 1EW, UK
+44-1223-763069
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss