On Dec 8, 2009, at 10:32 AM, Egon Willighagen wrote: > Right now, the wiki [0] says as requirements: > > * redistributable > * well and openly specified
It also states a proprietary format, even if published, is not Open as there is no community process for its development which would mean that * a community process is required in order to be open. Though it doesn't say if a company working with its customers counts as a community. > It also says the being able to modify the specification does not have > to be allowed. And this is where it is quite different from the Open > Source and Open Data ideas. There is something to be said about this, > as standards must not change rapidly, to allow people using the > standard to keep up. Yet somehow there are a wide number of standards (HTML) where vendors extend the specification and add new features, like the canvas tag. As well as standards, like the PDB format, which very few programs follow precisely. For one, it's hard to generate a new PDB file with the mandatory HEADER line if there's no PDB id or deposition date. It is interesting to read an RFC copyright, this from RFC 2616 Copyright (C) The Internet Society (1999). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. As another example, ISO 8601 (the common time and date formats) is also a proprietary spec, costing about US$100 to purchase, I think. There are also well known issues about patents and specifications, where the specification may be open but use a patented algorithm. This is addressed in the GPL3 and in statements like the following, from Google Wave: http://www.waveprotocol.org/patent-license Subject to the terms and conditions of this License, Google and its affiliates hereby grant to you a perpetual, worldwide, non-exclusive, no-charge, royalty free, irrevocable (except as stated in this License) patent license for patents necessarily infringed by implementation of this specification. If you institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the implementation of the specification constitutes direct or contributory patent infringement, then any patent licenses for the specification granted to you under this License shall terminate as of the date such litigation is filed. > These aim at ensuring that a specification can be fully and independently > implemented. Well, but then that's really the goal, isn't it? I think mentioning IUPAC's InChI code is relevant here. The code is available, but there is no specification. Literally, the code is the spec. This means people must guess as to which parts of the program are fundamental to InChI and which, like the parsing of SD files or support for backwards compatible bugs, are only implementation artifacts. Speaking of InChI, because the software is distributed under the LGPL, anyone wanting to make an new implementation must necessarily start by reading the existing source code. Resulting code may be under suspicion of copying the LGPL code until done using something like a clean room design http://en.wikipedia.org/wiki/Clean_room_design This would affect someone like me who prefers BSD-style licenses, and would also impair someone who wanted "independently implemented" code. There is no inventive by the InChI developers to aid in making a new implementation, because their primary goal is consistency, and new code will likely have bugs, or perhaps just differences of opinion. They also do not have the funding to work on all the things they might do with InChI. Do bear in mind that InChI is also not meant as an exchangeable structure format. It's meant as a unique identifier system, so that my ideas which are based on SMILES aren't really as applicable as I would like. > But, the way the wiki now reads seem to approve that the standard may > be developed in a closed community. Please define "open" and "closed" communities in this context. I'll add that I don't like the wide use of the term community. It seems to imply something stronger or perhaps more emotional than "open organization" or "open group", when that is not justified. I asked earlier about what "community" means in the context of a company. Can a company have a community? Consider the group of "OpenEye users". It's relatively open. Non-commercial users get a free license and commercial users must pay. I, for example, get a non-commercial license. The main user's group conference in the US is no cost and open to anyone, and includes a T-shirt and some free food. People show up every year to see friends and learn what others are doing with the OE tools. Some of these people have known each other for decades, starting with the earlier Daylight MUG conferences. Come to CUP in Santa Fe this March - I'll be glad to show you around town! If that is not open, what can a company do to make its process sufficiently open to meet with Blue Obelisk's approval? > Is that something the Blue Obelisk should approve, or should we promote > the standard development to be Open for the community too? The OpenSMILES > project certainly is qualifies as that. The current Blue Obelisk position, which says that SMILES (not OpenSMILES) is a proprietary standard, is one which I have brought up and argued against several times before, in different forums. SMILES was first published in JCICS, Weininger, D. (1988), with a more recent version published in "Handbook of Chemoinformatics" (ed. Gastinger, pub. Wiley) and of course the very detailed documentation at daylight.com. How could the documentation be more open? Dave wrote SMILES with the intention of it being a language that chemists could use to talk to each other even 100 years in the future. He also "always encouraged the widespread adoption of SMILES, and helped anyone who wanted to write a parser." (Quoting Craig James from http://depth-first.com/articles/2007/11/14/making-the-case-opensmiles ) I know this to be true because I know how much input I got from Dave on my own SMILES parsers over time. The Daylight user group conferences ("MUG") were also free and open to anyone. How could the support from the implementer be more open? In addition to Daylight SMILES, there are variations including: - Syracuse SMILES, which has "CL" and "BR" in the organic subset (which is all I know of it) - OpenEye SMILES (http://www.eyesopen.com/docs/html/pyprog/ExtensionstoDaylightSMILES.html) - Tripos SLN (that's more inspired by SMILES as they aren't intercompatible) - OpenBabel SMILES (which includes a notation for radicals) all of which have their own implementations. Plus of course the implementations in at least a dozen other programs. Doesn't this show that the specification was not closed? All this occurred before OpenSMILES existed. Why then does only OpenSMILES count, and not these other projects across the entire 21+ year history of SMILES? In any case, I really don't see how SMILES even before OpenSMILES could be considered less open than either CML or InChI are today. (BTW, bonus points for someone who can point me to the CML copyright statement and license!) In a related note, I bring up the MDL connection table formats. They are very nicely documented in http://www.mdli.com/downloads/public/ctfile/ctfile.jsp and available without registration. This documentation may not be redistributed, which is a problem. It is an update of the original format spec in JCICS 1992 (Dalby et al.) Would this be considered a non-proprietary format if some arbitrary person or group of people wrote up an equivalent "OpenCT" document with a Creative Commons - No Derivatives copyright? Or if we convinced MDL to allow redistribution of the PDF? Food for thought as you all work on this. Andrew [email protected] ------------------------------------------------------------------------------ Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev _______________________________________________ Blueobelisk-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
