I think the new InChIKey (or "hashed InChI) should meet this need, without
us having to create a BO identifier:
http://www.iupac.org/inchi/release102.html
Do you think this would work?

ACD and JChemPaint have already done versions that generate
"Wikipedia-ready" PNG files with embedded InChIs.  There are problems with
Google finding these on the Web, though, so I think we may need to
consider the InChIKey as a Google-friendly alternative.  (Peter - will
OpenBabel at WWMM be able to produce InChIKeys?)

The Wikimedia Commons does have a small collection of structure drawings. 
I'd like to see this expanded and tagged with something like InChIKeys,
and we could probably get this mostly done over a couple of years if the
chemists on WP support the idea (they probably would).  This collection
needs to be organised, but that's very easy - an afternoon's work.  It
does at least have clear, open copyrights, and easy links to Wikipedia and
to related photos.

http://commons.wikimedia.org/wiki/Main_Page
http://commons.wikimedia.org/wiki/Category:Chemical_structures
http://commons.wikimedia.org/wiki/Category:Chemical_compounds

Could this collection be helpful as at least a small repository?

Martin

Martin A. Walker
Department of Chemistry
SUNY College at Potsdam
Potsdam, NY 13676 USA
+1 (315) 267-2271




peter murray-rust wrote:
> The discussion on InChIs raises the question as to who creates and
> manages communal resources and metadata. InChIs work as they are
> algorithmic but they fail for inorganics (especially mineral
> polymorphs) and substances ("glucose", "glutamate"), etc. where
> conventional human-assigned identifiers are possible. If the
> substances are sufficiently common they will be in Wikipedia and that
> should work as an excellent mechanism, and if they are in Pubchem
> that is also a possibility. But if they are new - "proposed molecule
> X" or in a publication, then we need something else.
>
> A similar problem arises with images for structures. There are
> several types, but specifically (a) semantic, often ugly but
> machine-compatible and (b) pretty - cf TotallySynth - often with
> unclear machine semantics (e.g. perspective). Both are needed - the
> MathML community also has this problem.
>
> Wikipedia solves the image problem by providing a repository of
> images and allowing multiple link throughs.  2 years ago I thought
> about providing an image drawing service for blogs - draw once, use
> many. If, say, we had JChempaint mounted on our server anyone could
> draw their image for the blog and link to it. The killer was that we
> couldn't (a) provide a robust server and (b) demand might kill it.
>
> But now we have unlimited free storage everywhere. So what about:
> (a) there are a number of molecule drawing sites. (Obviously we can't
> provide ChemDraw, but most others would allow it - Marvin, JME, ACD).
> The author would draw an structure and - for organics - get the
> InChI. The service would immediately search the BO server space for
> the identical InChI (or, excitingly) any InChI related by layers. If
> it found other InChIs it could display these to the author, who might
> wish to use one with, say, fuller stereochemistry. Or a prettier
> version (e.g. for macrocycles).  There might even be some clever
> language processing - e.g. paste a name from a journal and get the
> structure - Peter has been looking into that.
> (b) the software generates an image and names it with a unique name.
> Probably not the InChI but either Pubchem CID or a BO ID (see below).
> The we post it to Flickr or Google or wherever. These sites remain
> stable so that authors could link to them from their blog. It would
> depend on the blog software how easy it was to download images into
> the text - Wordpress seems to do local files but not URLs - unless I
> have missed something. But actually cut and paste from remote images
> seems to work in many cases.
>
> So I suggest that we might need a BO identifier. It needs to be
> nearly unique. (If it collides once every few years the blogosphere
> will forgive you). I suspect that an MD5, or a datetime should be OK
> and this could be kept relatively short. Or we could simply ask the
> servers to assign ids sequentially and use new ones if they collide.
> We can't be the first to do this. We aren't running a bank or a
> nuclear power station so a problems won't matter.
>
> We'd have to have a bidirectional lookup for this identifier. InChi
> <==> BO. That could be done with RDF, and would be a fun exercise. If
> we get above 100,000 triples we will have succeeded anyway. There are
> people who are offering to host triple services for free. We could
> probably put it in our institutional repository for indexing chemistry
> theses.
>
> Anyway that is a first shot...
>
> P.
>
>
>
> Peter Murray-Rust
> Unilever Centre for Molecular Sciences Informatics
> University of Cambridge,
> Lensfield Road,  Cambridge CB2 1EW, UK
> +44-1223-763069
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Blueobelisk-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
>



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to