Hi Marcus, I completely agree. I think "overly verbose but flexible" is the definition of JSON, and its verbosity can be frustrating at times. It's also messy in statically typed languages - particularly typecasting as you unravel highly nested structures.
Supporting a maximally readable JSON would help out the Python/JS scripting folks, but will inevitably add some overhead and annoyance to the C++ workflow. Maybe the right answer is that JSON/mongo will never be the right answer for high performance. If a project gets to a point where you're profiling file I/O, then maybe JSON should be replaced by e.g. a simple serialized vector. Pat On Mon, Nov 21, 2016 at 10:25 AM, Marcus D. Hanwell < marcus.hanw...@kitware.com> wrote: > On Mon, Nov 21, 2016 at 10:47 AM, Patrick Fuller > <patrickful...@gmail.com> wrote: > > For what it's worth, I thought I'd try to add my outside opinion to the > > discussion. > > > > Where I'm coming from - I don't think that performance has ever been the > > motivation behind the JSON format. I view JSON as developer friendly and > > easy to implement, but also as the first thing to abandon if performance > > becomes limiting. Even optimized BSON structures (I think) require at > least > > one hash lookup, so it'll never be as efficient as the O(1) you get with > a > > flat array filled with reference indices. > > > > In that spirit, my opinion would be that CJSON should focus on developer > > friendliness at the cost of performance. CJSON should make it easier for > a > > first-year grad student to write useful code, even if it's obscenely slow > > and/or requires some 2TB HDDs. > > > Thank you for adding your thoughts. If you want to push in that > direction shouldn't we develop a format that treats atoms and bonds as > objects, with labels, and offers some redundancy in favor of ease of > use. This is largely where CML went with its representation, I can see > the utility but it isn't the style of format I wanted to work with and > feels overly verbose if very flexible. > > Something like (made up and not validated): > > { > "molecule": "myMolecule", > "inchi": "correctInChI_here", > "name": "friendlyName", > "atoms": [ { "atomicNumber': 4, "atomicSymbol": "Be", "x3": 1.1, > "y3": 1.1, "z3": 0.0, "label": "a1", "customLabel": "Bob" }, > { "atomicNumber': 6, "atomicSymbol": "C", "x3": > 0.0, "y3": 0.0, "z3": 0.0, "label": "a2" } ], > "bonds": [ { "label": "b1", "order": 1, "connections": [ "a1", "a2" ] } > ], > "properties": { "propertyKey": "valuePair" } > } > > You could condense the x, y, z into a vector of length 3. You can then > just query each atom/bond object, you still need some API to do the > lookup of bond connections to atoms. At which point I wonder if you > might be better off just developing a little utility code to sit above > the format. > > I can see the temptation to use JSON as an interface due to the > Python/JavaScript language support. Having gone down the path of > everything is an object before I wanted to pursue a different path, > but see some utility in the first year grad student model. I think for > me I would focus on two distinct use cases rather than try to put > everything into one approach - it should be relatively simple to move > between them. > > I am trying to finish writing up a paper that among other things looks > at these two different approaches, but I guess I err on the side of > minimalism. This type of thing is more amenable to the use the format > as API style approach, and if I spent more time in Python/JavaScript > may be something I would be more inclined to use. It wouldn't be hard > to support both, and part of me would like to benchmark and look at > storage implications out of curiosity. >
------------------------------------------------------------------------------
_______________________________________________ Avogadro-devel mailing list Avogadro-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/avogadro-devel