On 20/2/24 01:24, phili...@loco-labs.io wrote:

Hi community,

This memo is a proposal to implement a compact and reversible (lossless 
round-trip) JSON interface for multi-dimensional data and in particular for 
Numpy (see issue #12481). The links to the documents are at the end of the memo.

The JSON-NTV (Named and Typed value) format is a JSON format which integrates a 
notion of type. This format has also been implemented for tabular data (see 
NTV-pandas package available in the pandas ecosystem and the PDEP12 
specification). .

The use of this format has the following advantages:
- Taking into account data types not known to Numpy,
- Reversible format (lossless round-trip)
- Interoperability with other tools for tabular or multi-dimensional data (e.g. 
pandas, Xarray)
- Ease of sharing Json format
- Binary coding possible (e.g. CBOR format)
- Format integrating data of different nature

The associated Jupyter Notebook presents some key points of this proposal 
(first draft):

Summary:
   - introduction
   - benefits
   - multi-dimensionnal data
       - Multi-dimensional types
       - Format JSON
       - Using the NTV format
       - Equivalence of tabular format and multidimensional format
   - Astropy specific points
       - Units and quantities
       - Coordinates
       - Tables
       - Other structures

This subject seems important to me (in particular for interoperability issues) 
and I would like to have your feedback before working on the implementation.
Especially,
- do you think this “semantic” format is interesting to use?
- do you have any particular expectations or subjects that I need to study 
beforehand?
- do you have any examples or test cases to offer me?
And of course, any type of remark and comment is welcome.

Thanks in advance !

links:
- Jupyter notebook : 
https://nbviewer.org/github/loco-philippe/Environmental-Sensing/blob/main/python/Tests/numpy_tests.ipynb
- JSON-NTV format : https://www.ietf.org/archive/id/draft-thomy-json-ntv-02.html
- JSON-NTV overview : 
https://nbviewer.org/github/loco-philippe/NTV/blob/main/example/example_ntv.ipynb
- NTV tabular format : 
https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#name-tabular-structure
- NTV-pandas package : 
https://github.com/loco-philippe/ntv-pandas/blob/main/README.md
- NTV-pandas examples : 
https://nbviewer.org/github/loco-philippe/ntv-pandas/blob/main/example/example_ntv_pandas.ipynb
- Pandas specification - PDEP12 : 
https://pandas.pydata.org/pdeps/0012-compact-and-reversible-JSON-interface.html

There is an open issue [1] about such a format, is this the same or different?


We discussed this at the latest triage meeting. While interoperability is one of NumPy's goals, and something we care deeply about, we were not sure how this initiative will play out. Perhaps, like the Pandas package, it should live outside NumPy for a while until some wider consensus could emerge. We did have a few questions about the standard:

- How does it handle sharing data? NumPy can handle very large ndarrays, and a read-only container with a shared memory location, like in DLPack [0] seems more natural than a format that precludes sharing data.

- Is there a size limitation either on the data or on the number of dimensions? Could this format represent, for instance, data with more than 100 dimensions, which could not be mapped back to NumPy.


Matti


[0] https://dmlc.github.io/dlpack/latest/

[1] https://github.com/numpy/numpy/issues/12481

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to