I had some problems posting to this list yesterday and today, I think because 
I've been getting CC/BCC copies, which made me think I was a list member, but 
I'm not sure. Here goes attempt #2, and my apologies for any duplicates.


Begin forwarded message:

From: Andrew Dalke <[email protected]>
Date: December 8, 2009 1:00:13 PM GMT+01:00
To: BlueObelisk-Discuss <[email protected]>
Subject: Re: What is an Open Standard?

On Dec 8, 2009, at 10:32 AM, Egon Willighagen wrote:
> Right now, the wiki [0] says as requirements:
> 
> * redistributable
> * well and openly specified

It also states

 a proprietary format, even if published, is not Open as
 there is no community process for its development

which would mean that

* a community process

is required in order to be open. Though it doesn't say if a company working 
with its customers counts as a community.

> It also says the being able to modify the specification does not have
> to be allowed. And this is where it is quite different from the Open
> Source and Open Data ideas. There is something to be said about this,
> as standards must not change rapidly, to allow people using the
> standard to keep up.

Yet somehow there are a wide number of standards (HTML) where vendors extend 
the specification and add new features, like the canvas tag.

As well as standards, like the PDB format, which very few programs follow 
precisely. For one, it's hard to generate a new PDB file with the mandatory 
HEADER line if there's no PDB id or deposition date.

It is interesting to read an RFC copyright, this from RFC 2616

  Copyright (C) The Internet Society (1999). All Rights Reserved.

  This document and translations of it may be copied and furnished to
  others, and derivative works that comment on or otherwise explain it
  or assist in its implementation may be prepared, copied, published
  and distributed, in whole or in part, without restriction of any kind,
  provided that the above copyright notice and this paragraph are
  included on all such copies and derivative works. However, this
  document itself may not be modified in any way, such as by removing
  the copyright notice or references to the Internet Society or other
  Internet organizations, except as needed for the purpose of developing
  Internet standards in which case the procedures for copyrights defined
  in the Internet Standards process must be followed, or as required
  to translate it into languages other than English.

As another example, ISO 8601 (the common time and date formats) is also a 
proprietary spec, costing about US$100 to purchase, I think.

There are also well known issues about patents and specifications, where the 
specification may be open but use a patented algorithm. This is addressed in 
the GPL3 and in statements like the following, from Google Wave:

http://www.waveprotocol.org/patent-license

  Subject to the terms and conditions of this License, Google and
  its affiliates hereby grant to you a perpetual, worldwide,
  non-exclusive, no-charge, royalty free, irrevocable (except as
  stated in this License) patent license for patents necessarily
  infringed by implementation of this specification. If you institute
  patent litigation against any entity (including a cross-claim or
  counterclaim in a lawsuit) alleging that the implementation of
  the specification constitutes direct or contributory patent
  infringement, then any patent licenses for the specification
  granted to you under this License shall terminate as of the date
  such litigation is filed.


> These aim at ensuring that a specification can be fully and independently 
> implemented.

Well, but then that's really the goal, isn't it?

I think mentioning IUPAC's InChI code is relevant here. The code is available, 
but there is no specification. Literally, the code is the spec. This means 
people must guess as to which parts of the program are fundamental to InChI and 
which, like the parsing of SD files or support for backwards compatible bugs, 
are only implementation artifacts.

Speaking of InChI, because the software is distributed under the LGPL, anyone 
wanting to make an new implementation must necessarily start by reading the 
existing source code. Resulting code may be under suspicion of copying the LGPL 
code until done using something like a clean room design

    http://en.wikipedia.org/wiki/Clean_room_design

This would affect someone like me who prefers BSD-style licenses, and would 
also impair someone who wanted "independently implemented" code.

There is no inventive by the InChI developers to aid in making a new 
implementation, because their primary goal is consistency, and new code will 
likely have bugs, or perhaps just differences of opinion. They also do not have 
the funding to work on all the things they might do with InChI.

Do bear in mind that InChI is also not meant as an exchangeable structure 
format. It's meant as a unique identifier system, so that my ideas which are 
based on SMILES aren't really as applicable as I would like.

> But, the way the wiki now reads seem to approve that the standard may
> be developed in a closed community.

Please define "open" and "closed" communities in this context.

I'll add that I don't like the wide use of the term community. It seems to 
imply something stronger or perhaps more emotional than "open organization" or 
"open group", when that is not justified.

I asked earlier about what "community" means in the context of a company. Can a 
company have a community? Consider the group of "OpenEye users". It's 
relatively open. Non-commercial users get a free license and commercial users 
must pay. I, for example, get a non-commercial license. The main user's group 
conference in the US is no cost and open to anyone, and includes a T-shirt and 
some free food.

People show up every year to see friends and learn what others are doing with 
the OE tools. Some of these people have known each other for decades, starting 
with the earlier Daylight MUG conferences.

Come to CUP in Santa Fe this March - I'll be glad to show you around town!

If that is not open, what can a company do to make its process sufficiently 
open to meet with Blue Obelisk's approval?


> Is that something the Blue Obelisk should approve, or should we promote
> the standard development to be Open for the community too? The OpenSMILES
> project certainly is qualifies as that.

The current Blue Obelisk position, which says that SMILES (not OpenSMILES) is a 
proprietary standard, is one which I have brought up and argued against several 
times before, in different forums.

SMILES was first published in JCICS, Weininger, D. (1988), with a more recent 
version published in "Handbook of Chemoinformatics" (ed. Gastinger, pub. Wiley) 
and of course the very detailed documentation at daylight.com. How could the 
documentation be more open?

Dave wrote SMILES with the intention of it being a language that chemists could 
use to talk to each other even 100 years in the future. He also "always 
encouraged the widespread adoption of SMILES, and helped anyone who wanted to 
write a parser." (Quoting Craig James from 
http://depth-first.com/articles/2007/11/14/making-the-case-opensmiles ) 

I know this to be true because I know how much input I got from Dave on my own 
SMILES parsers over time. The Daylight user group conferences ("MUG") were also 
free and open to anyone. How could the support from the implementer be more 
open?

In addition to Daylight SMILES, there are variations including:
- Syracuse SMILES, which has "CL" and "BR" in the organic subset (which is all 
I know of it)
- OpenEye SMILES 
(http://www.eyesopen.com/docs/html/pyprog/ExtensionstoDaylightSMILES.html)
- Tripos SLN (that's more inspired by SMILES as they aren't intercompatible)
- OpenBabel SMILES (which includes a notation for radicals)

all of which have their own implementations. Plus of course the implementations 
in at least a dozen other programs.

Doesn't this show that the specification was not closed?

All this occurred before OpenSMILES existed. Why then does only OpenSMILES 
count, and not these other projects across the entire 21+ year history of 
SMILES?

In any case, I really don't see how SMILES even before OpenSMILES could be 
considered less open than either CML or InChI are today.

(BTW, bonus points for someone who can point me to the CML copyright statement 
and license!)


In a related note, I bring up the MDL connection table formats. They are very 
nicely documented in
http://www.mdli.com/downloads/public/ctfile/ctfile.jsp
and available without registration. This documentation may not be 
redistributed, which is a problem. It is an update of the original format spec 
in JCICS 1992 (Dalby et al.)

Would this be considered a non-proprietary format if some arbitrary person or 
group of people wrote up an equivalent "OpenCT" document with a Creative 
Commons - No Derivatives copyright? Or if we convinced MDL to allow 
redistribution of the PDF?

Food for thought as you all work on this.

                                Andrew
                                [email protected]


------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to