On Dec 10, 2009, at 8:19 AM, Peter Murray-Rust wrote:
> The bottom line is that licences are a very mixed blessing and that the 
> concept of "Community Norms" is primary (which is where I take the term 
> "community" from).

I would still like to know what defines a "community", and I would like to know 
why that terms is preferred over "group" or "organization."

I also point out that Blue Obelisk represents only a small percentage of the 
people who might be involved. For lack of a stronger definition, I'll say that 
"involved" corresponds roughly to those who have looked at a SMILES or SD file 
with the goal of understanding the underlying molecular structure information.

(Let's say, those who know that there's a line in connection table record which 
contains  the atom and bond counts, or know that %12 is a form of ring closure 
in SMILES.)


> I would argue that the BO is seeking to find some community norms for what is 
> an Open Standard in the practice of chemistry.

Unless "community" above only means "BO member", my argument is that BO must 
first have its own consistent definitions before it can hope to convince others 
in chemical informatics and the broader field of computational chemistry to 
adopt those goals.

I'm personally involved because open source is important to me. BO is the 
largest group developing open source software in this field, but I can not be a 
member while I have issues with what I see as biased and unjustified assertions 
about what it means to be open.

I have raised these issues before, with no reaction or change from the BO, so 
within the last week I've been making more forceful statements, including some 
postings to Blue Obelisk Exchange, in the hopes of getting this resolved to my 
satisfaction. I believe this current thread is due to my effort.

A fundamental problem I've had was the assertion that SMILES (by which I mean 
SMILES and not the SMILES canonicalization algorithm) and the MDL formats are 
proprietary and not open, with the same BO page saying that CML is open, and 
without any criteria to explain the distinction.

My own views are that SMILES is more open than CML, which is little more open 
than the MDL formats, which is a lot more open than a Chime molecule string.

> 1. Access. The work shall be available as a whole and at no more than a 
> reasonable reproduction cost, preferably downloading via the Internet without 
> charge. The work must also be available in a convenient and modifiable form.

While not quite relevant to this topic, I've read that "the typical cost of 
full text papers at publishers' sites is $30-35"

 http://www.gale.cengage.com/reference/peter/200705/ACM.htm 

which does not seem like a reasonable reproduction cost to me, but then, most 
of what I use is available at no charge. What's reasonable and what price would 
be unreasonable? Some vendors may feel that "what the market would bear" is 
reasonable.

Also relevant here is that the GNU site specifically encourages people to sell 
their free software for as much money as they want:

  http://www.gnu.org/philosophy/selling.html

Is this a difference between "free" and "open"?


The "Open Knowledge" guidelines also have some limitations as applied to a 
protocol. They say nothing about patent or trademark restrictions which may 
make the license irrelevant. The most common example is the LZW compression 
patent affecting the GIF spec. Trademarks are also used to protect and defend 
protocols.

The guidelines also say nothing about the appropriateness of reverse 
engineering, which may be important for those who wish to avoid the license 
issue altogether. For example, and I've not been able to verify this, I'm told 
that Mathematica's MathLink protocol includes sending a poem, the text being 
held under copyright. The goal was to prevent reverse engineering the spec, 
since that would entail making a copy in violation of Mathematica's license. I 
don't know if it was effective. In any case, the same could be done with an 
open spec, to force all users of the spec to release their code under an open 
license.


> But there are clear touchstones. A protocol which is only available to paying 
> customers of a company (such as canonicalSMILES) cannot be regarded as Open.

Canonicalization is not a protocol. The protocol is SMILES, and 
canonicalization is an ordering of the atoms and bonds in that protocol. SMILES 
and the canonicalization algorithm were even published in two different (though 
closely tied) articles.

The direct analogy would be that I could take CML and make a canonical CML 
(starting with the canonical XML representation, and defining a canonical atom 
and bond ordering based perhaps on what InChI or OpenBabel reports). That 
canonical CML would be completely readable by any CML parser.

Canonical CML may or may not be open, but that has little bearing on the 
openness of CML itself.

Similarly, the existence of canonical SMILES from Daylight should have little 
bearing on the openness of the SMILES protocol reported by Weininger in JCICS 
(1988), described in detail in many places, and where Weininger specifically 
wanted SMILES to be a language for chemists and enthusiastically helped those 
who wrote parsers for it and experimented with variations.

For that matter, sending canonical SMILES to other locations - the essence of 
an open protocol - is not that useful. Even with Daylight, the canonical 
algorithm has changed over time and it's impossible to know which one was used 
given a SMILES without redoing the canonicalization.

If canonicalization is important then the best practice is to recanonicalize it 
yourself, using the same algorithm each time. There are several open and free 
packages which will do that.


> I think it will be valuable to see what other domains have to say about this.

The Blue Obelisk wiki had a page which pushed for open protocols, and gave 
examples of which protocols were and were not open. For the reasons I think 
I've now well described, I took issue with those viewpoints, and I'm grateful 
that the tone on that page has been moderated.

Still, I feel it would have been nice for BO to have come up with a consistent 
set of principles before making specific statements as to which protocols were 
and were not open.

Cheers,

                                Andrew
                                [email protected]



------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to