On 11/25/21 17:05, Stephan Hoyer wrote:
Hi Qianqian,

What is your concrete proposal for NumPy here?

Are you suggesting new methods or functions like to_json/from_json in NumPy itself?


that would work - either define a subclass of JSONEncoder to serialize ndarray and allow users to pass it to cls in json.dump, or, as you mentioned, define to_json/from_json like pandas DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html> would save people from writing customized codes/formats.

I am also wondering if there is a more automated way to tell json.dump/dumps to use a default serializer for ndarray without using cls=...? I saw a SO post mentioned about a method called "__serialize__" in a class, but can't find it in the official doc. I am wondering if anyone is aware of the method defining a default json serializer in an object?


As far as I can tell, reading/writing in your custom JSON format already works with your jdata library.


ideally, I was hoping the small jdata encoder/decoder functions can be integrated into numpy; it can help avoid the "TypeError: Object of type ndarray is not JSON serializable" in json.dump/dumps without needing additional modules; more importantly, it simplifies users experience in exchanging complex arrays (complex valued, sparse, special shapes) with other programming environments.

Qianqian



Best,
Stephan

On Thu, Nov 25, 2021 at 2:35 PM Qianqian Fang <q.f...@neu.edu> wrote:

    Dear numpy developers,

    I would like to share a proposal on making ndarray JSON
    serializable by default, as detailed in this github issue:

    https://github.com/numpy/numpy/issues/20461


    briefly, my group and collaborators are working on a new NIH
    (National Institute of Health) funded initiative - NeuroJSON
    (http://neurojson.org) - to further disseminate a lightweight data
    annotation specification (JData
    <https://github.com/NeuroJSON/jdata/blob/master/JData_specification.md>)
    among the broad neuroimaging/scientific community. Python and
    numpy have been widely used
    <http://neuro.debian.net/_files/nipy-handout.pdf> in neuroimaging
    data analysis pipelines (nipy, nibabel, mne-python, PySurfer ...
    ), because N-D array is THE most important data structure used in
    scientific data. However, numpy currently does not support JSON
    serialization by default. This is one of the frequently requested
    features on github (#16432, #12481).

    We have developed a lightweight python modules (jdata
    <https://pypi.org/project/jdata/>, bjdata
    <https://pypi.org/project/bjdata/>) to help export/import ndarray
    objects to/from JSON (and a binary JSON format - BJData
    
<https://github.com/NeuroJSON/bjdata/blob/master/Binary_JData_Specification.md>/UBJSON
    <http://ubjson.org/> - to gain efficiency). The approach is to
    convert ndarray objects to a dictionary  with subfields using
    standardized JData annotation tags. The JData spec can serialize
    complex data structures such as N-D arrays (solid, sparse,
    complex). trees, graphs, tables etc. It also permits data
    compression. These annotations have been implemented in my MATLAB
    toolbox - JSONLab <https://github.com/fangq/jsonlab> - since 2011
    to help import/export MATLAB data types, and have been broadly
    used among MATLAB/GNU Octave users.

    Examples of these portable JSON annotation tags representing N-D
    arrays can be found at

    
http://openjdata.org/wiki/index.cgi?JData/Examples/Basic#2_D_arrays_in_the_annotated_format
    http://openjdata.org/wiki/index.cgi?JData/Examples/Advanced

    and the detailed formats on N-D array annotations can be found in
    the spec:

    
https://github.com/NeuroJSON/jdata/blob/master/JData_specification.md#annotated-storage-of-n-d-arrays


    our current python module to encode/decode ndarray to JSON
    serializable forms are implemented in these compact functions
    (handling lossless type/data conversion and data compression)

    
https://github.com/NeuroJSON/pyjdata/blob/63301d41c7b97fc678fa0ab0829f76c762a16354/jdata/jdata.py#L72-L97
    
https://github.com/NeuroJSON/pyjdata/blob/63301d41c7b97fc678fa0ab0829f76c762a16354/jdata/jdata.py#L126-L160

    We strongly believe that enabling JSON serialization by default
    will benefit the numpy user community, making it a lot easier to
    share complex data between platforms
    (MATLAB/Python/C/FORTRAN/JavaScript...) via a
    standardized/NIH-backed data annotation scheme.

    We are happy to hear your thoughts, suggestions on how to
    contribute, and also glad to set up dedicated discussions.

    Cheers

    Qianqian

    _______________________________________________
    NumPy-Discussion mailing list -- numpy-discussion@python.org
    To unsubscribe send an email to numpy-discussion-le...@python.org
    https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
    Member address: sho...@gmail.com


_______________________________________________
NumPy-Discussion mailing list --numpy-discussion@python.org
To unsubscribe send an email tonumpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address:fan...@gmail.com
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to