Thanks everyone for the discussion and contributions.
Sam has given a good example of how energies can be held.for each set of
conformers (which can also include molecular dynamics and optimisation by
compChem and we use this approach in analysing QM output.

It's certainly possible to hold the coordinates in a <matrix> but it loses
the transparency of XML. It makes it difficult to search the file for
semantics - there is nothing to indicate what the information means (the
matrix could hold vibrational frequencies, kpoints, whatever).

It's certainly possible to hold the coordinates in atoms without the bonds
or element types. For example

<cml convention="bo:...">
<molecule id="m1">
  <atomArray id="aa1">
    <atom id="a1" elementType="H" x3="0.0" y3="0.0" z3=0.0"/>
    <atom id="a2" elementType="Cl" x3="1.5" y3="0.0" z3=0.0"/>
  </atomArray>
  <bondArray>
    <bond id="b1" atomRefs2="a1 a2"/>
  </bondArray>
</molecule>

<molecule>
  <propertyList>... conformer stuff ... </propertyList>
  <atomArray id="aa2">
    <atom id="a1" x3="0.0" y3="0.0" z3=0.0"/>
    <atom id="a2" x3="1.6" y3="0.0" z3=0.0"/>
  </atomArray>
</molecule>

<molecule>
  <propertyList>... conformer stuff ... </propertyList>
  <atomArray id="aa3">
    <atom id="a1" x3="0.0" y3="0.0" z3=0.0"/>
    <atom id="a2" x3="1.7" y3="0.0" z3=0.0"/>
  </atomArray>
</molecule>

</cml>

However if you are still worried about size it's possible to use the array
form of the atomArray:

<atomArray
  x3Array="1 2 3 4 5 6"
  y3Array="9 8 7 6 5 4"
  z3Array="1 9 1 9 1 9"/>

This will hold exactly the x3 y3 z3 coordinates with complete semantics and
almost no verbosity. It requires a bit more programming and is less
semantic.



There is always a balance between terseness and explicitness. It's tempting
to remove all markup and use the known sequence of numbers to define the
object. This gives something like:

<cml>
<molecule id="m1">
  <atomArray id="aa1">
    <atom id="a1" elementType="H" x3="0.0" y3="0.0" z3=0.0"/>
    <atom id="a2" elementType="Cl" x3="1.5" y3="0.0" z3=0.0"/>
  </atomArray>
  <bondArray>
    <bond id="b1" atomRefs2="a1 a2"/>
  </bondArray>

  <propertyList>... conformer stuff ... </propertyList>
<matrix>0.0 0.0 0.0 1.5 0.0 0.0"/>

  <propertyList>... conformer stuff ... </propertyList>
<matrix>0.0 0.0 0.0 1.6 0.0 0.0"/>

  <propertyList>... conformer stuff ... </propertyList>
<matrix>0.0 0.0 0.0 1.7 0.0 0.0"/>
</cml>

The difficulty of this is that there is no explicit markup and anyone who
doesn't know the convention would have to guess at the order of rows, etc.
It's harder to use semantic search engines to find information. The primary
points of CML are:
* to make it explicit to human readers what the information is.
* to balance flexibility against robustness
* to make it easier to write software of high quality.
Remember that in CML the order of atoms is not defined (their identity comes
from their ids). Many of the
current problems of quality in chemoinformatics comes from guessed semantics
and over-fluid semantics. CML offers an increase in robustness and ease of
programming at a relatively small cost in filesize (especially when
compressed)


I had a typical example yesterday. We were creating MOL files from CML and
got the second numeric field in the atom record wrong. It holds the charges
and spin in a transformed and completely opaque way. It wasted time and we
naroowly avoided filling the repository with junk. With CML that would have
been impossible but we had to use a MOL file. I'd argue that writing:
formalCharge="1"
instead of a left-justified 3 (or is it 5) is a justification for using a
few more characters.

I think the atomArray may be the best solution. I haven't used it recently
but it's completely supported in JUMBO.

P.


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to