On Dec 10, 2009, at 8:19 AM, Peter Murray-Rust wrote: > The bottom line is that licences are a very mixed blessing and that the > concept of "Community Norms" is primary (which is where I take the term > "community" from).
I would still like to know what defines a "community", and I would like to know why that terms is preferred over "group" or "organization." I also point out that Blue Obelisk represents only a small percentage of the people who might be involved. For lack of a stronger definition, I'll say that "involved" corresponds roughly to those who have looked at a SMILES or SD file with the goal of understanding the underlying molecular structure information. (Let's say, those who know that there's a line in connection table record which contains the atom and bond counts, or know that %12 is a form of ring closure in SMILES.) > I would argue that the BO is seeking to find some community norms for what is > an Open Standard in the practice of chemistry. Unless "community" above only means "BO member", my argument is that BO must first have its own consistent definitions before it can hope to convince others in chemical informatics and the broader field of computational chemistry to adopt those goals. I'm personally involved because open source is important to me. BO is the largest group developing open source software in this field, but I can not be a member while I have issues with what I see as biased and unjustified assertions about what it means to be open. I have raised these issues before, with no reaction or change from the BO, so within the last week I've been making more forceful statements, including some postings to Blue Obelisk Exchange, in the hopes of getting this resolved to my satisfaction. I believe this current thread is due to my effort. A fundamental problem I've had was the assertion that SMILES (by which I mean SMILES and not the SMILES canonicalization algorithm) and the MDL formats are proprietary and not open, with the same BO page saying that CML is open, and without any criteria to explain the distinction. My own views are that SMILES is more open than CML, which is little more open than the MDL formats, which is a lot more open than a Chime molecule string. > 1. Access. The work shall be available as a whole and at no more than a > reasonable reproduction cost, preferably downloading via the Internet without > charge. The work must also be available in a convenient and modifiable form. While not quite relevant to this topic, I've read that "the typical cost of full text papers at publishers' sites is $30-35" http://www.gale.cengage.com/reference/peter/200705/ACM.htm which does not seem like a reasonable reproduction cost to me, but then, most of what I use is available at no charge. What's reasonable and what price would be unreasonable? Some vendors may feel that "what the market would bear" is reasonable. Also relevant here is that the GNU site specifically encourages people to sell their free software for as much money as they want: http://www.gnu.org/philosophy/selling.html Is this a difference between "free" and "open"? The "Open Knowledge" guidelines also have some limitations as applied to a protocol. They say nothing about patent or trademark restrictions which may make the license irrelevant. The most common example is the LZW compression patent affecting the GIF spec. Trademarks are also used to protect and defend protocols. The guidelines also say nothing about the appropriateness of reverse engineering, which may be important for those who wish to avoid the license issue altogether. For example, and I've not been able to verify this, I'm told that Mathematica's MathLink protocol includes sending a poem, the text being held under copyright. The goal was to prevent reverse engineering the spec, since that would entail making a copy in violation of Mathematica's license. I don't know if it was effective. In any case, the same could be done with an open spec, to force all users of the spec to release their code under an open license. > But there are clear touchstones. A protocol which is only available to paying > customers of a company (such as canonicalSMILES) cannot be regarded as Open. Canonicalization is not a protocol. The protocol is SMILES, and canonicalization is an ordering of the atoms and bonds in that protocol. SMILES and the canonicalization algorithm were even published in two different (though closely tied) articles. The direct analogy would be that I could take CML and make a canonical CML (starting with the canonical XML representation, and defining a canonical atom and bond ordering based perhaps on what InChI or OpenBabel reports). That canonical CML would be completely readable by any CML parser. Canonical CML may or may not be open, but that has little bearing on the openness of CML itself. Similarly, the existence of canonical SMILES from Daylight should have little bearing on the openness of the SMILES protocol reported by Weininger in JCICS (1988), described in detail in many places, and where Weininger specifically wanted SMILES to be a language for chemists and enthusiastically helped those who wrote parsers for it and experimented with variations. For that matter, sending canonical SMILES to other locations - the essence of an open protocol - is not that useful. Even with Daylight, the canonical algorithm has changed over time and it's impossible to know which one was used given a SMILES without redoing the canonicalization. If canonicalization is important then the best practice is to recanonicalize it yourself, using the same algorithm each time. There are several open and free packages which will do that. > I think it will be valuable to see what other domains have to say about this. The Blue Obelisk wiki had a page which pushed for open protocols, and gave examples of which protocols were and were not open. For the reasons I think I've now well described, I took issue with those viewpoints, and I'm grateful that the tone on that page has been moderated. Still, I feel it would have been nice for BO to have come up with a consistent set of principles before making specific statements as to which protocols were and were not open. Cheers, Andrew [email protected] ------------------------------------------------------------------------------ Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev _______________________________________________ Blueobelisk-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
