Re: [Blueobelisk-discuss] Blue Obelisk Identifiers and resources and images

peter murray-rust Mon, 24 Sep 2007 15:35:38 -0700

At 21:50 22/09/2007, Martin A. Walker wrote:
>I think the new InChIKey (or "hashed InChI) should meet this need, without
>us having to create a BO identifier:
>http://www.iupac.org/inchi/release102.html
>Do you think this would work?


Yes. And I think it should probably be used.


>ACD and JChemPaint have already done versions that generate
>"Wikipedia-ready" PNG files with embedded InChIs.  There are problems with
>Google finding these on the Web, though, so I think we may need to
>consider the InChIKey as a Google-friendly alternative.  (Peter - will
>OpenBabel at WWMM be able to produce InChIKeys?)

We are reviewing our web services and trying to make them 
lightweight. We'll try to take the opportunity to use latest versions.


>The Wikimedia Commons does have a small collection of structure drawings.

excellent

>I'd like to see this expanded and tagged with something like InChIKeys,

yes

>and we could probably get this mostly done over a couple of years if the
>chemists on WP support the idea (they probably would).

yes

>  This collection
>needs to be organised, but that's very easy - an afternoon's work.  It
>does at least have clear, open copyrights, and easy links to Wikipedia and
>to related photos.
>
>http://commons.wikimedia.org/wiki/Main_Page
>http://commons.wikimedia.org/wiki/Category:Chemical_structures
>http://commons.wikimedia.org/wiki/Category:Chemical_compounds
>
>Could this collection be helpful as at least a small repository?

Yes

For all common compounds *with InChIs" we should have diagrams. The 
problem comes with compounds for which there are no InChIs. I have 
shown some examples on my blog.

This is not an easy problem.

P.

Keep up the good work. I am predicting that WP-chem will be the 
primary chemical teaching resource in 5 years


>Martin
>
>Martin A. Walker
>Department of Chemistry
>SUNY College at Potsdam
>Potsdam, NY 13676 USA
>+1 (315) 267-2271
>
>
>
>
>peter murray-rust wrote:
> > The discussion on InChIs raises the question as to who creates and
> > manages communal resources and metadata. InChIs work as they are
> > algorithmic but they fail for inorganics (especially mineral
> > polymorphs) and substances ("glucose", "glutamate"), etc. where
> > conventional human-assigned identifiers are possible. If the
> > substances are sufficiently common they will be in Wikipedia and that
> > should work as an excellent mechanism, and if they are in Pubchem
> > that is also a possibility. But if they are new - "proposed molecule
> > X" or in a publication, then we need something else.
> >
> > A similar problem arises with images for structures. There are
> > several types, but specifically (a) semantic, often ugly but
> > machine-compatible and (b) pretty - cf TotallySynth - often with
> > unclear machine semantics (e.g. perspective). Both are needed - the
> > MathML community also has this problem.
> >
> > Wikipedia solves the image problem by providing a repository of
> > images and allowing multiple link throughs.  2 years ago I thought
> > about providing an image drawing service for blogs - draw once, use
> > many. If, say, we had JChempaint mounted on our server anyone could
> > draw their image for the blog and link to it. The killer was that we
> > couldn't (a) provide a robust server and (b) demand might kill it.
> >
> > But now we have unlimited free storage everywhere. So what about:
> > (a) there are a number of molecule drawing sites. (Obviously we can't
> > provide ChemDraw, but most others would allow it - Marvin, JME, ACD).
> > The author would draw an structure and - for organics - get the
> > InChI. The service would immediately search the BO server space for
> > the identical InChI (or, excitingly) any InChI related by layers. If
> > it found other InChIs it could display these to the author, who might
> > wish to use one with, say, fuller stereochemistry. Or a prettier
> > version (e.g. for macrocycles).  There might even be some clever
> > language processing - e.g. paste a name from a journal and get the
> > structure - Peter has been looking into that.
> > (b) the software generates an image and names it with a unique name.
> > Probably not the InChI but either Pubchem CID or a BO ID (see below).
> > The we post it to Flickr or Google or wherever. These sites remain
> > stable so that authors could link to them from their blog. It would
> > depend on the blog software how easy it was to download images into
> > the text - Wordpress seems to do local files but not URLs - unless I
> > have missed something. But actually cut and paste from remote images
> > seems to work in many cases.
> >
> > So I suggest that we might need a BO identifier. It needs to be
> > nearly unique. (If it collides once every few years the blogosphere
> > will forgive you). I suspect that an MD5, or a datetime should be OK
> > and this could be kept relatively short. Or we could simply ask the
> > servers to assign ids sequentially and use new ones if they collide.
> > We can't be the first to do this. We aren't running a bank or a
> > nuclear power station so a problems won't matter.
> >
> > We'd have to have a bidirectional lookup for this identifier. InChi
> > <==> BO. That could be done with RDF, and would be a fun exercise. If
> > we get above 100,000 triples we will have succeeded anyway. There are
> > people who are offering to host triple services for free. We could
> > probably put it in our institutional repository for indexing chemistry
> > theses.
> >
> > Anyway that is a first shot...
> >
> > P.
> >
> >
> >
> > Peter Murray-Rust
> > Unilever Centre for Molecular Sciences Informatics
> > University of Cambridge,
> > Lensfield Road,  Cambridge CB2 1EW, UK
> > +44-1223-763069
> >
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Microsoft
> > Defy all challenges. Microsoft(R) Visual Studio 2005.
> > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > _______________________________________________
> > Blueobelisk-discuss mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
> >
>
>
>
>-------------------------------------------------------------------------
>This SF.net email is sponsored by: Microsoft
>Defy all challenges. Microsoft(R) Visual Studio 2005.
>http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>_______________________________________________
>Blueobelisk-discuss mailing list
>[email protected]
>https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road,  Cambridge CB2 1EW, UK
+44-1223-763069 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Re: [Blueobelisk-discuss] Blue Obelisk Identifiers and resources and images

Reply via email to