Answering a number of comments from both BlueObelisk and OpenBabel forums,
regarding the proposal to formally define how to do comments to SMILES files.
To summarize my current opinion based on recent feedback:
1. A "#" character (not ';' or space) as the first character on a line
is treated as a comment.
2. Users should be cautioned that this is a new standard, and many
parsers won't accept comments. Parsers should accept them, but
SMILES writers should avoid them, until the standard is widely
accepted.
3. If comments are included, the first line of the file should be
a file-type identifier: '#\#SMILES_1.0'.
Now to answer specific comments...
Peter Murray-Rust wrote:
> Please, Please don't use whitespace. It is so easy to lose or to
> generate by mistake.
Peter and several others pointed this out. I agree, a space is a poor choice.
Peter Murray-Rust wrote:
> It's generally a good idea NOT to use a character out of the language
> syntax for a comment. both hash and / are SMILES characters. There are a
> few others which I think are unused.
Actually, no. If you include reaction SMILES and SMARTS (which should also use
the same comment syntax), then the only unused character seems to be '|', the
vertical-bar or "pipe" character. That seems like a poor choice for comments
because of its importance in Unix/Linux shell programming.
It seems to me that any parser with half a brain should be able to figure this
out. It's not much of a trick to distinguish '#' at the start of a line from a
legitimate triple-bond symbol.
Greg Landrum wrote:
> My two cents:
> I'd really like to see a distinction between SMILES -- a
> non-whitespace containing piece of text describing a molecule -- and a
> SMILES file -- which is, I guess, a bunch of SMILES, possibly with
> additional data, combined into one file.
Actually the OpenSMILES specification does distinguish the two. See "SMILES
Files":
http://opensmiles.org/spec/open-smiles-4-output.html#4.5
Greg Landrum wrote:
> If the goal is to get multiple molecules, with extra information, into
> one file I'd rather see an OpenTDT standard... TDT is an
> under-utilized (outside of Daylight) format that is quite useful.
I like TDTs too, but I think they've been outdated by XML. XML is more
verbose, but there's lots of great libraries to create and parse XML, and
databases support it directly. TDTs were ahead of their time.
Daniel Leidert wrote:
> ... If you formally allow comments in
> (Open)SMILES files, then please add a requirement to start the file with
> a comment line containing a file-type identifier (like e.g. CIF 1.1
> does) or we have just another format, which may start with a hundred or
> thousend lines long comment...
Good idea, see #3 the summary at the top.
Andrew Dalke wrote:
> In general though, this proposal is incompatible with most existing
> SMILES parsers.
Another good observation, see #2 in the summary at the top.
Thanks to everyone, and I invite further comments.
Craig
------------------------------------------------------------------------------
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss