Re: [BlueObelisk-discuss] Getting the Most from This Discussion group

Peter Murray-Rust Fri, 19 Nov 2010 02:06:21 -0800

[Copied to Quixote list, Blue Obelisk and Henry Rzepa. Please don't
duplicate maillists in replies]

On Fri, Nov 19, 2010 at 3:02 AM, Phil B <pebou...@gmail.com> wrote:

> Hi:
>
> [adherence to the goals of the workshop]
>

I will respond to this in a positive and hopefully constructive manner. I
shall reply to the goals of the meeting - I'll list the first goal and then
comment - leave the second till another post

*I**mmediate Goal*: The goal of this workshop is not to produce a white
paper! Rather it is to identify a set of requirements, and *a group of
willing participants to develop open source code *[PMR's emph] to accelerate
knowledge sharing. Our starting point, and the only prerequisite to
participating, is the belief that we need to move *B**eyond the PDF*.
Specifically, we think that better integration between the research paper
and research data is imperative - see our papers for more details on this
thinking, and please add your own so we know your thoughts!

*PMR: May I add to this that we need open data. Code without data to work on
is often useless. Many subjects are not as liberated as bioscience
*
Code is expensive and distributed working code is very expensive. I know -
I've been doing it for twenty years. What we do not want is a committee
deciding what should be done - we need a group of people who can actually
write code and run programs.It is extremely difficult to get a shared vision
and maintain it. Having said that I have managed it on about 3.5 occasions
and not managed it on about 10.

It is very likely that we will have to build on existing components. Luckily
there are some of those around. The components, source and data (if we
include it) must be completely Open. see http://www.opendefinition.org/ .
Source must be OSI-compliant. It is no good to have web services that
represent a walled garden.

A product can be built in a very short time. There are now hackathons - am I
am running one in Cambridge  just before the meeting - where a group of
committed hackers see what can be done in 24 hours. No planned sleep. It's
done by integrating existing components, mashups, lashups - glueware. It's
becoming very successful for government data. Here's one at the same time (
http://culturehackday.org.uk/ )

I*t’s a weekend for the arts, software and hardware hackers to get
togetherand create
exciting things.  It’s about taking all the interesting things that are
hiding in some of our great arts organisations, and mixing them up in new
and interesting ways – making new cultural products and sparking new
relationships and ways of thinking. It’s about wondering what might happen
if some clever and interesting people thought about new ways of navigating
arts. But it’s also about having fun and making new stuff and collaborating
in different ways.

...
So we’re looking for institutions that are able to open up their data and
have ideas for things to do with it, as well as hackers and developers
who’ve got cultural and arts-based itches they want to scratch.*

I'm part of this. We've been working with JISC and The British Library to
open up their bibliography. We announced this yesterday. (
http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2754 ). Lots of positive
reaction on twittersphere. It's going to change libraries, how they work and
engage with the community. We have now put in a grant to do this for
museums. The world's knowledge - except for sciences llike chemistry and
several others - is going Open. If this workshop can actually get one or
more major publishers to make their data content Open - that would be a
worthwhile outcome. But make no mistake what I am proposing is a seriously
disruptive approach and many will oppose it.

You do not - generally - get funding in academia to develop software [1]. And
you do not get rewarded for doing so. I've been very fortunate to have had a
stable job for some time and enough marginal resources to develop Chemical
Markup Language. I'll talk about CML in another thread, but it's now a large
body of experience, code and examples in semantic chemistry. It is almost
universally ignored by commercial organisations and Open Source is sometimes
even publicly dismissed in chemistry. None the less it's been possible to
put together a chemical infrastructure where almost all of the components
are present.

This has been done by an ad hoc group - the Blue Obelisk - about 5 years
old. This group has no money, no constitution, no membership, no meetings
(other than about 1.5 dinners per year). But it works from a shared vision
and writing components that interoperate. It does not have a top-down plan.
However we now have offerings in several areas which are as good as
commercial products and in some cases better. Our own work (Cambridge) on
text-mining is ahead of others and it would be useful if there were anything
to actually mine legally - we are restricted to using patents and theses. I
offer the Blue Obelisk as giving considerable experience for the type of
group you are proposing.

Over the past 6-7 years I have been 1-2 times a year to meetings called
something like "Databases in Quantum Chemistry", "Standards for MO programs"
"workflows for Compchem". Each of those has a worthy goal - to make programs
work together. The meeting usually finishes with the group deciding they are
going to work together and setting up a mailing list. A few sporadic mails,
and then nothing.

This year I went to yet another meeting in Zaragoza - "Databases in QC" -
about 25 people. I talked about the idea of uniting compchem using CML. A
number of people were sceptical of these ideas - particularly since I am not
a mainstream compchem. But 3-4 people really caught the vision.

We decided that we would physically meet in Cambridge in month with a
WORKING prototype of a complete compchem infrastructure - from designing the
calculations, to creating the jobs, to gathering the results and converting
them to RDF with a SPARQL and chemical search endpoint.

We managed it. There's lots of glueware but the prototype was created. It
has to be rebuilt - no prototype ever really survives. See
http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2659 .

Now I am suggesting we do a similar exerecise for BTPDF as a PRE-workshop
deliverable. We'd very much like BTPDF folks to be involved. The intention
is:
* choose a chemical scientific problem that is meaningful and apparently
tractable
* get volunteers to carry out literature review, computation and analysis;
distributed if possible.
* do the whole experiment completely in the Open. [That debars it
automatically from being published in normal chemical journals because
pre-publication means journals will not accept the work. But this will be a
new type of publication.]
* use Open Source and Open Data throughout.

In Quixote we work by exposing all semi-static material on Wikis, using
public mail groups for alerting, using Etherpads and Skype for rapid
interactive discussion once a week or more. We use REST/URI and the Ritchie
filesystem as our storage and all resources are world readable. The final
result is ingested into a triple store.

IMO this is a scientific publication. Of course we get no current
recognition or reward from that but I would rather develop new things than
fight with the counterproductive ritual of trying to publish data in PDF. We
might as a ritual gesture write it as a traditional paper and submit it to a
traditional journal but with the difference that it's written by machine. It
meets all the values of a current publication (priority, record, exposure
for review) except citation metrics.

We'll go ahead anyway. Henry Rzepa (copied) has great experience at finding
ideas in the literature and computing them. He's got something potential at
the moment where each volunteer can create part of this.

And you do not necessarily have to be a computational chemist to take part.
You just need to have a hacker mentality - be able to install software and
run  programs and not get uptight when they don't work.

So I'm willing to take the chance that (a) this will work technically - it
will (b) that enough people will join it - always an unknown.

I'd like to involve Peter Sefton's Open infrastructure from the start. I
think it should become the mainstream of scientific communication.
Volunteers to help make contributions, reports etc. would be great. Anyone
with the right mindset can contribute.

That's probably enough for now. The actual data substrate will be published
chemistry data. I'll have to work carefully to make sure that I can extract
enough without the Uinversity getting cut off by publishers (this has
happened to me on more than one occasion even for perfectly legal
activities). But the volume is smallish at this stage and I think I know
where to look.

A great deal can be done in a month. The SAX protocol for XML - which is
probably on every computer on the planet - was developed on the XML-DEV
mailing list (Henry Rzepa and me) by a committed group of about 100
volunteers. David Megginson drove it in late 1997. It took a month. see
http://www.saxproject.org/sax1-history.html . [Of course I get no credit or
reward for it other than having made a meritocratic contribution to
progress.]

We have nearly TWO months.

I'll post more details later.

P.

[1][I should gratefully acknowledge significant funding from Microsoft
Research - we have worked together to make the resulting code completely
Open, though it still has to run on a Word/Office stack. The immediate wider
benefit has been to create a standard representation of CML (Joe Townsend).]

-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev

_______________________________________________
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Re: [BlueObelisk-discuss] Getting the Most from This Discussion group

Reply via email to