I think the new InChIKey (or "hashed InChI) should meet this need, without us having to create a BO identifier: http://www.iupac.org/inchi/release102.html Do you think this would work?
ACD and JChemPaint have already done versions that generate "Wikipedia-ready" PNG files with embedded InChIs. There are problems with Google finding these on the Web, though, so I think we may need to consider the InChIKey as a Google-friendly alternative. (Peter - will OpenBabel at WWMM be able to produce InChIKeys?) The Wikimedia Commons does have a small collection of structure drawings. I'd like to see this expanded and tagged with something like InChIKeys, and we could probably get this mostly done over a couple of years if the chemists on WP support the idea (they probably would). This collection needs to be organised, but that's very easy - an afternoon's work. It does at least have clear, open copyrights, and easy links to Wikipedia and to related photos. http://commons.wikimedia.org/wiki/Main_Page http://commons.wikimedia.org/wiki/Category:Chemical_structures http://commons.wikimedia.org/wiki/Category:Chemical_compounds Could this collection be helpful as at least a small repository? Martin Martin A. Walker Department of Chemistry SUNY College at Potsdam Potsdam, NY 13676 USA +1 (315) 267-2271 peter murray-rust wrote: > The discussion on InChIs raises the question as to who creates and > manages communal resources and metadata. InChIs work as they are > algorithmic but they fail for inorganics (especially mineral > polymorphs) and substances ("glucose", "glutamate"), etc. where > conventional human-assigned identifiers are possible. If the > substances are sufficiently common they will be in Wikipedia and that > should work as an excellent mechanism, and if they are in Pubchem > that is also a possibility. But if they are new - "proposed molecule > X" or in a publication, then we need something else. > > A similar problem arises with images for structures. There are > several types, but specifically (a) semantic, often ugly but > machine-compatible and (b) pretty - cf TotallySynth - often with > unclear machine semantics (e.g. perspective). Both are needed - the > MathML community also has this problem. > > Wikipedia solves the image problem by providing a repository of > images and allowing multiple link throughs. 2 years ago I thought > about providing an image drawing service for blogs - draw once, use > many. If, say, we had JChempaint mounted on our server anyone could > draw their image for the blog and link to it. The killer was that we > couldn't (a) provide a robust server and (b) demand might kill it. > > But now we have unlimited free storage everywhere. So what about: > (a) there are a number of molecule drawing sites. (Obviously we can't > provide ChemDraw, but most others would allow it - Marvin, JME, ACD). > The author would draw an structure and - for organics - get the > InChI. The service would immediately search the BO server space for > the identical InChI (or, excitingly) any InChI related by layers. If > it found other InChIs it could display these to the author, who might > wish to use one with, say, fuller stereochemistry. Or a prettier > version (e.g. for macrocycles). There might even be some clever > language processing - e.g. paste a name from a journal and get the > structure - Peter has been looking into that. > (b) the software generates an image and names it with a unique name. > Probably not the InChI but either Pubchem CID or a BO ID (see below). > The we post it to Flickr or Google or wherever. These sites remain > stable so that authors could link to them from their blog. It would > depend on the blog software how easy it was to download images into > the text - Wordpress seems to do local files but not URLs - unless I > have missed something. But actually cut and paste from remote images > seems to work in many cases. > > So I suggest that we might need a BO identifier. It needs to be > nearly unique. (If it collides once every few years the blogosphere > will forgive you). I suspect that an MD5, or a datetime should be OK > and this could be kept relatively short. Or we could simply ask the > servers to assign ids sequentially and use new ones if they collide. > We can't be the first to do this. We aren't running a bank or a > nuclear power station so a problems won't matter. > > We'd have to have a bidirectional lookup for this identifier. InChi > <==> BO. That could be done with RDF, and would be a fun exercise. If > we get above 100,000 triples we will have succeeded anyway. There are > people who are offering to host triple services for free. We could > probably put it in our institutional repository for indexing chemistry > theses. > > Anyway that is a first shot... > > P. > > > > Peter Murray-Rust > Unilever Centre for Molecular Sciences Informatics > University of Cambridge, > Lensfield Road, Cambridge CB2 1EW, UK > +44-1223-763069 > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Blueobelisk-discuss mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Blueobelisk-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
