Re: [Avogadro-devel] CJSON format proposal

Boone, Paul Tue, 22 Nov 2016 13:07:30 -0800

On November 21, 2016 at 11:25:40 AM, Marcus D. Hanwell 
([email protected]<mailto:[email protected]>) wrote:
Something like (made up and not validated):


{
"molecule": "myMolecule",
"inchi": "correctInChI_here",
"name": "friendlyName",
"atoms": [ { "atomicNumber': 4, "atomicSymbol": "Be", "x3": 1.1,
"y3": 1.1, "z3": 0.0, "label": "a1", "customLabel": "Bob" },
{ "atomicNumber': 6, "atomicSymbol": "C", "x3":
0.0, "y3": 0.0, "z3": 0.0, "label": "a2" } ],
"bonds": [ { "label": "b1", "order": 1, "connections": [ "a1", "a2" ] } ],
"properties": { "propertyKey": "valuePair" }
}

So I agree that going to an object like the above would be a pretty standard 
verbose form of JSON and it also has some obvious advantages… For example, in 
the current structure, if you added a new set of atom coords but forgot to add 
the element #, you cannot tell which atom is missing an element #, but in the 
above JSON you can immediately tell because all the data for an atom is grouped 
together.

But I definitely wasn’t trying to encourage a redesign of the entire CJSON 
format (which seems like huge overkill), which is why all I suggested was 
breaking the atoms coords and the bonds into tuples so they’re grouped. That 
would be enough to make the format explicit,  and allow us to index all the 
field arrays (i.e. atom coords, atom elements, etc) by the same index to get 
all the fields associated with that atom. So 'coords[3]’ would refer to the 
coords of the third atom, and 'element[3]’ would refer to the element of the 
third atom.

I’m ok with this being a variation of the format (i.e. adding this as an 
optional tuple version, along with the version bump) if that makes the most 
sense to you all. I still don’t have a great idea where else this format is 
being used so I’m happy to leave the final decision to all of you who know 
better where and how CJSON is being used.

Paul







You could condense the x, y, z into a vector of length 3. You can then
just query each atom/bond object, you still need some API to do the
lookup of bond connections to atoms. At which point I wonder if you
might be better off just developing a little utility code to sit above
the format.

I can see the temptation to use JSON as an interface due to the
Python/JavaScript language support. Having gone down the path of
everything is an object before I wanted to pursue a different path,
but see some utility in the first year grad student model. I think for
me I would focus on two distinct use cases rather than try to put
everything into one approach - it should be relatively simple to move
between them.

I am trying to finish writing up a paper that among other things looks
at these two different approaches, but I guess I err on the side of
minimalism. This type of thing is more amenable to the use the format
as API style approach, and if I spent more time in Python/JavaScript
may be something I would be more inclined to use. It wouldn't be hard
to support both, and part of me would like to benchmark and look at
storage implications out of curiosity.

------------------------------------------------------------------------------

_______________________________________________
Avogadro-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/avogadro-devel

Re: [Avogadro-devel] CJSON format proposal

Reply via email to