Thanks very much Chris,

You are absolutely right that ref can (and perhaps should) be used for this
purpose. I wondered whether to raise it.

The power and the problem of references is that you can do almost anything
with them. That means that parsers have to be quite smart. We use @ref quite
a lot in our polymer builder where we use it to duplicate molecules prior to
joining them.

References can have two main semantics:
* reference semantics. Here the @ref is a pointer to the current state of
the atom, so that if it cahnges, so does the reference. I think this is not
very useful for exposed CML and is more likley to support the internals of a
program.
* copy semantics. This effectively takes a complete copy of the object and
after that the two are unsynched. For example:

<molecule id="m1">
  <atomArray id="aa1">
    <atom id="a1" elementType="H" x3="0.0" y3="0.0" z3=0.0"/>
    <atom id="a2" elementType="Cl" x3="1.5" y3="0.0" z3=0.0"/>
 >...

    <atom id="a1_1" ref="a1"/>

could take a complete copy of the atom a1. Subsequently any change to a1_1
would not affect a1. If attributes are present then they would add or
replace the information, e.g.

    <atom id="a2_1" ref="a1" isotopeNumber="37" x3="2.0"/>

would create:

    <atom id="a2_1" elementType="Cl" x3="2.0" y3="0.0" z3=0.0"
isotopeNumber="37"/>

This is a perfectly reasonable behaviour (apart from the change of
isotope!).

The main change is really to the *programmers* in the Blue Obelisk. If they
are prepared to support reference-copy semantics I would be delighted. This
means that we can develop a set of archetypal data structures (e.g. for
conformers, etc.) which are agreed in the community and labelled by suitable
@convention.

So I'd summarise the alternatives as:
* complete explicit copies (my first suggestion). ++ easy to program write
and read and easy to understand. -- verbose
* atomar...@fooarray. terse represenatation. ++ terse and easy to
understand. -- limited to certain attributes and only works with rectangular
data (e.g. atoms cannot have children easily); -- neads programming.
* @ref copy-semantics. ++reasonably terse and elegant, easy to understand.
Probably easy to write. -- harder to program.

All these are possible. It may be than different applications require
different approaches - e.g. conformers can use *array easily, other
applications may not find it so easy.

I'd value your thoughts - I have no fixed ideas here, but would hope that we
could get convergence on communal views.

For various reasons this comes at a very good time and we are starting to
see CML applications in a number of areas. Our experience in Chem4Word has
been very useful and we have a much clearer idea of how CML can be used.
On Tue, Aug 4, 2009 at 7:31 AM, Chris Morley <[email protected]> wrote:

> Is this a place to use the ref attribute? "...This is similar to a
> pointer and it can be thought of a strongly typed hyperlink. It may
> also be used for "subclassing" or "overriding" elements..."
>
> <!--main molecule-->
> <molecule id="m1">
>   <atomArray id="aa1">
>     <atom id="a1" elementType="H" x3="0.0" y3="0.0" z3=0.0"/>
>     <atom id="a2" elementType="Cl" x3="1.5" y3="0.0" z3=0.0"/>
>  >...
>
> <molecule ref="m1" id="conformer1" convention="bo:conformer"
>     <atom ref="a1" x3="0.9" y3="0.7" z3=0.5"/>
>     <atom ref="a2" x3="1.6" y3="0.4" z3=0.2"/>
> ...
>
> Only the atoms that were different would need to be specified; there
> could be sub-sub classing; putting convention on the molecule allows
> multiple molecules in the file. It would be backward compatible if
> parsers were to ignore an element with the ref attribute unless they
> knew how to handle it. Such a parser would see only the main
> molecule(s). Other atom and bond properties could maybe be treated in
> the same way: stereo and other isomers, tautomers, etc.
>
> Chris
>
> Peter Murray-Rust wrote:
> > Thanks everyone for the discussion and contributions.
> > Sam has given a good example of how energies can be held.for each set of
> > conformers (which can also include molecular dynamics and optimisation
> > by compChem and we use this approach in analysing QM output.
> >
> > It's certainly possible to hold the coordinates in a <matrix> but it
> > loses the transparency of XML. It makes it difficult to search the file
> > for semantics - there is nothing to indicate what the information means
> > (the matrix could hold vibrational frequencies, kpoints, whatever).
> >
> > It's certainly possible to hold the coordinates in atoms without the
> > bonds or element types. For example
> >
> > <cml convention="bo:...">
> > <molecule id="m1">
> >   <atomArray id="aa1">
> >     <atom id="a1" elementType="H" x3="0.0" y3="0.0" z3=0.0"/>
> >     <atom id="a2" elementType="Cl" x3="1.5" y3="0.0" z3=0.0"/>
> >   </atomArray>
> >   <bondArray>
> >     <bond id="b1" atomRefs2="a1 a2"/>
> >   </bondArray>
> > </molecule>
> >
> > <molecule>
> >   <propertyList>... conformer stuff ... </propertyList>
> >   <atomArray id="aa2">
> >     <atom id="a1" x3="0.0" y3="0.0" z3=0.0"/>
> >     <atom id="a2" x3="1.6" y3="0.0" z3=0.0"/>
> >   </atomArray>
> > </molecule>
> >
> > <molecule>
> >   <propertyList>... conformer stuff ... </propertyList>
> >   <atomArray id="aa3">
> >     <atom id="a1" x3="0.0" y3="0.0" z3=0.0"/>
> >     <atom id="a2" x3="1.7" y3="0.0" z3=0.0"/>
> >   </atomArray>
> > </molecule>
> >
> > </cml>
> >
> > However if you are still worried about size it's possible to use the
> > array form of the atomArray:
> >
> > <atomArray
> >   x3Array="1 2 3 4 5 6"
> >   y3Array="9 8 7 6 5 4"
> >   z3Array="1 9 1 9 1 9"/>
> >
> > This will hold exactly the x3 y3 z3 coordinates with complete semantics
> > and almost no verbosity. It requires a bit more programming and is less
> > semantic.
> >
> >
> >
> > There is always a balance between terseness and explicitness. It's
> > tempting to remove all markup and use the known sequence of numbers to
> > define the object. This gives something like:
> >
> > <cml>
> > <molecule id="m1">
> >   <atomArray id="aa1">
> >     <atom id="a1" elementType="H" x3="0.0" y3="0.0" z3=0.0"/>
> >     <atom id="a2" elementType="Cl" x3="1.5" y3="0.0" z3=0.0"/>
> >   </atomArray>
> >   <bondArray>
> >     <bond id="b1" atomRefs2="a1 a2"/>
> >   </bondArray>
> >
> >   <propertyList>... conformer stuff ... </propertyList>
> > <matrix>0.0 0.0 0.0 1.5 0.0 0.0"/>
> >
> >   <propertyList>... conformer stuff ... </propertyList>
> > <matrix>0.0 0.0 0.0 1.6 0.0 0.0"/>
> >
> >   <propertyList>... conformer stuff ... </propertyList>
> > <matrix>0.0 0.0 0.0 1.7 0.0 0.0"/>
> > </cml>
> >
> > The difficulty of this is that there is no explicit markup and anyone
> > who doesn't know the convention would have to guess at the order of
> > rows, etc. It's harder to use semantic search engines to find
> > information. The primary points of CML are:
> > * to make it explicit to human readers what the information is.
> > * to balance flexibility against robustness
> > * to make it easier to write software of high quality.
> > Remember that in CML the order of atoms is not defined (their identity
> > comes from their ids). Many of the
> > current problems of quality in chemoinformatics comes from guessed
> > semantics and over-fluid semantics. CML offers an increase in robustness
> > and ease of programming at a relatively small cost in filesize
> > (especially when compressed)
> >
> >
> > I had a typical example yesterday. We were creating MOL files from CML
> > and got the second numeric field in the atom record wrong. It holds the
> > charges and spin in a transformed and completely opaque way. It wasted
> > time and we naroowly avoided filling the repository with junk. With CML
> > that would have been impossible but we had to use a MOL file. I'd argue
> > that writing:
> > formalCharge="1"
> > instead of a left-justified 3 (or is it 5) is a justification for using
> > a few more characters.
> >
> > I think the atomArray may be the best solution. I haven't used it
> > recently but it's completely supported in JUMBO.
> >
> > P.
> >
> >
> > --
> > Peter Murray-Rust
> > Reader in Molecular Informatics
> > Unilever Centre, Dep. Of Chemistry
> > University of Cambridge
> > CB2 1EW, UK
> > +44-1223-763069
> >
> >
> > ------------------------------------------------------------------------
> >
> >
> ------------------------------------------------------------------------------
> > Let Crystal Reports handle the reporting - Free Crystal Reports 2008
> 30-Day
> > trial. Simplify your report design, integration and deployment - and
> focus on
> > what you do best, core application coding. Discover what's new with
> > Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > cml-discuss mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/cml-discuss
> >
> >
> > ------------------------------------------------------------------------
> >
> >
> > No virus found in this incoming message.
> > Checked by AVG - www.avg.com
> > Version: 8.5.392 / Virus Database: 270.13.42/2279 - Release Date:
> 08/03/09 05:57:00
> >
>
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> cml-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/cml-discuss
>



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to