Re: [Blueobelisk-discuss] Blue Obelisk Identifiers and resources and images

Egon Willighagen Mon, 24 Sep 2007 04:40:27 -0700

Hi Peter,

On 9/22/07, peter murray-rust <[EMAIL PROTECTED]> wrote:
> The discussion on InChIs raises the question as to who creates and
> manages communal resources and metadata. InChIs work as they are
> algorithmic but they fail for inorganics (especially mineral
> polymorphs) and substances ("glucose", "glutamate"), etc. where
> conventional human-assigned identifiers are possible.


Chemical blogspace also makes use of URL identifiers, which also
provide us with forward compatibility with RDF technologies. For
example:

http://en.wikipedia.org/wiki/Glucose

where owl:sameAs is helpful.

> If the
> substances are sufficiently common they will be in Wikipedia and that
> should work as an excellent mechanism, and if they are in Pubchem
> that is also a possibility. But if they are new - "proposed molecule
> X" or in a publication, then we need something else.

Agreed. Chemical blogspace has about 150 out of 430 molecules not in
PubChem. Yes, blogspace is cutting edge science :)

> A similar problem arises with images for structures. There are
> several types, but specifically (a) semantic, often ugly but
> machine-compatible and (b) pretty - cf TotallySynth - often with
> unclear machine semantics (e.g. perspective). Both are needed - the
> MathML community also has this problem.

A free repository of curated 2D diagrams... youtube for molecules.

> Wikipedia solves the image problem by providing a repository of
> images and allowing multiple link throughs.  2 years ago I thought
> about providing an image drawing service for blogs - draw once, use
> many. If, say, we had JChempaint mounted on our server anyone could
> draw their image for the blog and link to it. The killer was that we
> couldn't (a) provide a robust server and (b) demand might kill it.
>
> But now we have unlimited free storage everywhere. So what about:
> (a) there are a number of molecule drawing sites. (Obviously we can't
> provide ChemDraw, but most others would allow it - Marvin, JME, ACD).
> The author would draw an structure and - for organics - get the
> InChI. The service would immediately search the BO server space for
> the identical InChI (or, excitingly) any InChI related by layers. If
> it found other InChIs it could display these to the author, who might
> wish to use one with, say, fuller stereochemistry. Or a prettier
> version (e.g. for macrocycles).  There might even be some clever
> language processing - e.g. paste a name from a journal and get the
> structure - Peter has been looking into that.
> (b) the software generates an image and names it with a unique name.
> Probably not the InChI but either Pubchem CID or a BO ID (see below).

BO ID, maybe:

http://blueobelisk.sourceforge.net/wiki/Bid99999999

> The we post it to Flickr or Google or wherever.

Ack, and add the BO ID to that as annotation.

Tagging of molecules can be done via e.g. Connotea:

http://chem-bla-ics.blogspot.com/2007/09/tagging-molecules-mashup-of-connotea.html

> These sites remain
> stable so that authors could link to them from their blog. It would
> depend on the blog software how easy it was to download images into
> the text - Wordpress seems to do local files but not URLs - unless I
> have missed something. But actually cut and paste from remote images
> seems to work in many cases.
>
> So I suggest that we might need a BO identifier. It needs to be
> nearly unique. (If it collides once every few years the blogosphere
> will forgive you). I suspect that an MD5, or a datetime should be OK
> and this could be kept relatively short.

Why not a simple numbering scheme like above? b99999999 allows us to
move forward for the next forseeable future.

> Or we could simply ask the
> servers to assign ids sequentially and use new ones if they collide.
> We can't be the first to do this. We aren't running a bank or a
> nuclear power station so a problems won't matter.

:)

Funny you mention this... last week in Ulm, someone suggested (forgot
who) during the CIC-CINF session that one reason against open access
chemistry is the (increased) risk of chemical terrorism :)

> We'd have to have a bidirectional lookup for this identifier. InChi
> <==> BO.

If possible...

> That could be done with RDF, and would be a fun exercise. If
> we get above 100,000 triples we will have succeeded anyway. There are
> people who are offering to host triple services for free.

I got triples for chemistry... but an indexing service would be nice.
The tag annotation of the molecules is part of it.

Egon

-- 
----
http://chem-bla-ics.blogspot.com/

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Re: [Blueobelisk-discuss] Blue Obelisk Identifiers and resources and images

Reply via email to