Can you explain more how BSON fits in here? If CJSON were supposed to be a file format for internal interchange, then I have nothing against storing coords as one array and then processing them when you read it in or out.
For the python interface though, we were going to use CJSON as a public interchange format, and for a public interface, I’d be adamant about sticking to the principles of (1) readability (i.e. the format making sense to somebody just reading the text format) and (2) and explicitness (i.e. the structure of the file should represent the underlying data, without needing to interpret it in any way). Otherwise, we’re letting our internal implementation determine the structure of the format, when we want the semantics of the underlying data to determine the format. I specifically wouldn’t worry about space considerations of the sub-arrays, but I don’t ever worry about space for JSON since it just wasn’t intended for that. Right now I think the biggest CJSON file we’re testing with for the python interface is about 117k, which I don’t think of as large. But I have no insight into how this format is being used elsewhere… Are you using it for really large structures? So the fundamental question for me that I’m sure you all can provide some insight on is: - are there really two formats here: (1) a format designed for internal use, ease of importing / exporting straight to/from avogadro internal structures, and optimized for minimal size, likely using BSON and (2) a public format designed for readablity and semantic explicitness? - or, is the public format sufficient for both purposes? You also mention some additional changes you were thinking of making. Can you tell me about those? Thanks! -- Paul Boone Lead Developer, Wilmer Lab | http://wilmerlab.com/ Center for Simulation and Modeling | https://www.sam.pitt.edu/ University of Pittsburgh On November 18, 2016 at 11:53:21 AM, Marcus D. Hanwell (marcus.hanw...@kitware.com<mailto:marcus.hanw...@kitware.com>) wrote: On Fri, Nov 18, 2016 at 11:11 AM, Boone, Paul <paulbo...@pitt.edu> wrote: > > I’m working with Geoff on the python plugin architecture. There are a couple > small change to the cjson format I’d like to propose. Currently, the cjson > has structure that is implied but not explicit in the file itself, and this > forces an adopter of the file to extrapolate the format instead of just > reading it. Making the structure explicit will make it much easier to use > the cjson format, as well as making it more intuitive when looking at or > editing the file directly. > > The changes would be: > > (1) group atom coordinates by atom: > > i.e.: > “3d”: [ > [1,2,3], > [1,2,5], > etc > ] > > instead of: > “3d”: [ > 1,2,3,1,2,5, > etc > ] > > (2) group bonds by bond: > > i.e.: > > "index" : > [ > [0,1], > [0,34], > [0,35], > ... > ] > > instead of: > > "index" : > [ > 0, > 1, > 0, > 34, > 0, > 35, > ... > ] > > When the file semantically reflects the actual structure, we can just use > the cjson as-is without doing anything. Currently, where it doesn’t reflect > the actual structure, I have to do list comprehensions that are not terribly > intuitive to marshall the structure back and forth. > One of the concerns when developing this was storage in BSON, where each array needs a type, a length, and then the values. Having many short arrays is not very efficient, it also makes the mapping from Avogadro's internal storage very simple as it is much more efficient to store connectivity, coordinates etc in a contiguous block which is just written to a JSON array. I think developing small amounts of C++, Python and JavaScript API to interface with the raw arrays is reasonable. If people feel very strongly about making tuples the extra space in a JSON text block isn't too much of a concern, but there is existing code that I would like to continue supporting for at least the short term. If we change to tuples we should bump the format number, and keep code to read/write version 0 in the reader/writer to support older files. There are a number of other changes we are considering too. Alternatively we could add a new key for tuples and output that for the Python code, retaining the flat arrays for compatibility with existing code. It seems like a reasonable convenience function.
------------------------------------------------------------------------------
_______________________________________________ Avogadro-devel mailing list Avogadro-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/avogadro-devel