Re: [Avogadro-devel] CJSON format proposal

Boone, Paul Sun, 20 Nov 2016 09:33:08 -0800

Can you explain more how BSON fits in here? If CJSON were supposed to be a file 
format for internal interchange, then I have nothing against storing coords as 
one array and then processing them when you read it in or out.

For the python interface though, we were going to use CJSON as a public 
interchange format, and for a public interface, I’d be adamant about sticking 
to the principles of (1) readability (i.e. the format making sense to somebody 
just reading the text format) and (2) and explicitness (i.e. the structure of 
the file should represent the underlying data, without needing to interpret it 
in any way). Otherwise, we’re letting our internal implementation determine the 
structure of the format, when we want the semantics of the underlying data to 
determine the format.

I specifically wouldn’t worry about space considerations of the sub-arrays, but 
I don’t ever worry about space for JSON since it just wasn’t intended for that. 
Right now I think the biggest CJSON file we’re testing with for the python 
interface is about 117k, which I don’t think of as large. But I have no insight 
into how this format is being used elsewhere… Are you using it for really large 
structures?

So the fundamental question for me that I’m sure you all can provide some 
insight on is:

- are there really two formats here: (1) a format designed for internal use, 
ease of importing / exporting straight to/from avogadro internal structures, 
and optimized for minimal size, likely using BSON and (2) a public format 
designed for readablity and semantic explicitness?
- or, is the public format sufficient for both purposes?

You also mention some additional changes you were thinking of making. Can you 
tell me about those?

Thanks!

--
Paul Boone
Lead Developer, Wilmer Lab | http://wilmerlab.com/
Center for Simulation and Modeling | https://www.sam.pitt.edu/
University of Pittsburgh

On November 18, 2016 at 11:53:21 AM, Marcus D. Hanwell 
(marcus.hanw...@kitware.com<mailto:marcus.hanw...@kitware.com>) wrote:

On Fri, Nov 18, 2016 at 11:11 AM, Boone, Paul <paulbo...@pitt.edu> wrote:
>
> I’m working with Geoff on the python plugin architecture. There are a couple
> small change to the cjson format I’d like to propose. Currently, the cjson
> has structure that is implied but not explicit in the file itself, and this
> forces an adopter of the file to extrapolate the format instead of just
> reading it. Making the structure explicit will make it much easier to use
> the cjson format, as well as making it more intuitive when looking at or
> editing the file directly.
>
> The changes would be:
>
> (1) group atom coordinates by atom:
>
> i.e.:
> “3d”: [
> [1,2,3],
> [1,2,5],
> etc
> ]
>
> instead of:
> “3d”: [
> 1,2,3,1,2,5,
> etc
> ]
>
> (2) group bonds by bond:
>
> i.e.:
>
> "index" :
> [
> [0,1],
> [0,34],
> [0,35],
> ...
> ]
>
> instead of:
>
> "index" :
> [
> 0,
> 1,
> 0,
> 34,
> 0,
> 35,
> ...
> ]
>
> When the file semantically reflects the actual structure, we can just use
> the cjson as-is without doing anything. Currently, where it doesn’t reflect
> the actual structure, I have to do list comprehensions that are not terribly
> intuitive to marshall the structure back and forth.
>
One of the concerns when developing this was storage in BSON, where
each array needs a type, a length, and then the values. Having many
short arrays is not very efficient, it also makes the mapping from
Avogadro's internal storage very simple as it is much more efficient
to store connectivity, coordinates etc in a contiguous block which is
just written to a JSON array.

I think developing small amounts of C++, Python and JavaScript API to
interface with the raw arrays is reasonable. If people feel very
strongly about making tuples the extra space in a JSON text block
isn't too much of a concern, but there is existing code that I would
like to continue supporting for at least the short term.

If we change to tuples we should bump the format number, and keep code
to read/write version 0 in the reader/writer to support older files.
There are a number of other changes we are considering too.
Alternatively we could add a new key for tuples and output that for
the Python code, retaining the flat arrays for compatibility with
existing code. It seems like a reasonable convenience function.

------------------------------------------------------------------------------

_______________________________________________
Avogadro-devel mailing list
Avogadro-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/avogadro-devel

Re: [Avogadro-devel] CJSON format proposal

Reply via email to