At 21:50 22/09/2007, Martin A. Walker wrote: >I think the new InChIKey (or "hashed InChI) should meet this need, without >us having to create a BO identifier: >http://www.iupac.org/inchi/release102.html >Do you think this would work?
Yes. And I think it should probably be used. >ACD and JChemPaint have already done versions that generate >"Wikipedia-ready" PNG files with embedded InChIs. There are problems with >Google finding these on the Web, though, so I think we may need to >consider the InChIKey as a Google-friendly alternative. (Peter - will >OpenBabel at WWMM be able to produce InChIKeys?) We are reviewing our web services and trying to make them lightweight. We'll try to take the opportunity to use latest versions. >The Wikimedia Commons does have a small collection of structure drawings. excellent >I'd like to see this expanded and tagged with something like InChIKeys, yes >and we could probably get this mostly done over a couple of years if the >chemists on WP support the idea (they probably would). yes > This collection >needs to be organised, but that's very easy - an afternoon's work. It >does at least have clear, open copyrights, and easy links to Wikipedia and >to related photos. > >http://commons.wikimedia.org/wiki/Main_Page >http://commons.wikimedia.org/wiki/Category:Chemical_structures >http://commons.wikimedia.org/wiki/Category:Chemical_compounds > >Could this collection be helpful as at least a small repository? Yes For all common compounds *with InChIs" we should have diagrams. The problem comes with compounds for which there are no InChIs. I have shown some examples on my blog. This is not an easy problem. P. Keep up the good work. I am predicting that WP-chem will be the primary chemical teaching resource in 5 years >Martin > >Martin A. Walker >Department of Chemistry >SUNY College at Potsdam >Potsdam, NY 13676 USA >+1 (315) 267-2271 > > > > >peter murray-rust wrote: > > The discussion on InChIs raises the question as to who creates and > > manages communal resources and metadata. InChIs work as they are > > algorithmic but they fail for inorganics (especially mineral > > polymorphs) and substances ("glucose", "glutamate"), etc. where > > conventional human-assigned identifiers are possible. If the > > substances are sufficiently common they will be in Wikipedia and that > > should work as an excellent mechanism, and if they are in Pubchem > > that is also a possibility. But if they are new - "proposed molecule > > X" or in a publication, then we need something else. > > > > A similar problem arises with images for structures. There are > > several types, but specifically (a) semantic, often ugly but > > machine-compatible and (b) pretty - cf TotallySynth - often with > > unclear machine semantics (e.g. perspective). Both are needed - the > > MathML community also has this problem. > > > > Wikipedia solves the image problem by providing a repository of > > images and allowing multiple link throughs. 2 years ago I thought > > about providing an image drawing service for blogs - draw once, use > > many. If, say, we had JChempaint mounted on our server anyone could > > draw their image for the blog and link to it. The killer was that we > > couldn't (a) provide a robust server and (b) demand might kill it. > > > > But now we have unlimited free storage everywhere. So what about: > > (a) there are a number of molecule drawing sites. (Obviously we can't > > provide ChemDraw, but most others would allow it - Marvin, JME, ACD). > > The author would draw an structure and - for organics - get the > > InChI. The service would immediately search the BO server space for > > the identical InChI (or, excitingly) any InChI related by layers. If > > it found other InChIs it could display these to the author, who might > > wish to use one with, say, fuller stereochemistry. Or a prettier > > version (e.g. for macrocycles). There might even be some clever > > language processing - e.g. paste a name from a journal and get the > > structure - Peter has been looking into that. > > (b) the software generates an image and names it with a unique name. > > Probably not the InChI but either Pubchem CID or a BO ID (see below). > > The we post it to Flickr or Google or wherever. These sites remain > > stable so that authors could link to them from their blog. It would > > depend on the blog software how easy it was to download images into > > the text - Wordpress seems to do local files but not URLs - unless I > > have missed something. But actually cut and paste from remote images > > seems to work in many cases. > > > > So I suggest that we might need a BO identifier. It needs to be > > nearly unique. (If it collides once every few years the blogosphere > > will forgive you). I suspect that an MD5, or a datetime should be OK > > and this could be kept relatively short. Or we could simply ask the > > servers to assign ids sequentially and use new ones if they collide. > > We can't be the first to do this. We aren't running a bank or a > > nuclear power station so a problems won't matter. > > > > We'd have to have a bidirectional lookup for this identifier. InChi > > <==> BO. That could be done with RDF, and would be a fun exercise. If > > we get above 100,000 triples we will have succeeded anyway. There are > > people who are offering to host triple services for free. We could > > probably put it in our institutional repository for indexing chemistry > > theses. > > > > Anyway that is a first shot... > > > > P. > > > > > > > > Peter Murray-Rust > > Unilever Centre for Molecular Sciences Informatics > > University of Cambridge, > > Lensfield Road, Cambridge CB2 1EW, UK > > +44-1223-763069 > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2005. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Blueobelisk-discuss mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss > > > > > >------------------------------------------------------------------------- >This SF.net email is sponsored by: Microsoft >Defy all challenges. Microsoft(R) Visual Studio 2005. >http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >_______________________________________________ >Blueobelisk-discuss mailing list >[email protected] >https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss Peter Murray-Rust Unilever Centre for Molecular Sciences Informatics University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK +44-1223-763069 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Blueobelisk-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
