hi there,

I have written a Go package[1] that can read/write simple arrays in the numpy 
file format [2].
when I wrote it, it was for simple interoperability use cases, but now people 
would like to be able to read back ragged-arrays[3].

unless I am mistaken, this means I need to interpret pieces of pickled data 
(`ndarray`, `multiarray` and `dtype`).

so I am trying to understand how to unpickle `dtype` values that have been 
pickled:

```python
import numpy as np
import pickle
import pickletools as pt

pt.dis(pickle.dumps(np.dtype("int32"), protocol=4), annotate=True)
```

gives:
```
    0: \x80 PROTO      4 Protocol version indicator.
    2: \x95 FRAME      55 Indicate the beginning of a new frame.
   11: \x8c SHORT_BINUNICODE 'numpy' Push a Python Unicode string object.
   18: \x94 MEMOIZE    (as 0)        Store the stack top into the memo.  The 
stack is not popped.
   19: \x8c SHORT_BINUNICODE 'dtype' Push a Python Unicode string object.
   26: \x94 MEMOIZE    (as 1)        Store the stack top into the memo.  The 
stack is not popped.
   27: \x93 STACK_GLOBAL             Push a global object (module.attr) on the 
stack.
   28: \x94 MEMOIZE    (as 2)        Store the stack top into the memo.  The 
stack is not popped.
   29: \x8c SHORT_BINUNICODE 'i4'    Push a Python Unicode string object.
   33: \x94 MEMOIZE    (as 3)        Store the stack top into the memo.  The 
stack is not popped.
   34: \x89 NEWFALSE                 Push False onto the stack.
   35: \x88 NEWTRUE                  Push True onto the stack.
   36: \x87 TUPLE3                   Build a three-tuple out of the top three 
items on the stack.
   37: \x94 MEMOIZE    (as 4)        Store the stack top into the memo.  The 
stack is not popped.
   38: R    REDUCE                   Push an object built from a callable and 
an argument tuple.
   39: \x94 MEMOIZE    (as 5)        Store the stack top into the memo.  The 
stack is not popped.
   40: (    MARK                     Push markobject onto the stack.
   41: K        BININT1    3         Push a one-byte unsigned integer.
   43: \x8c     SHORT_BINUNICODE '<' Push a Python Unicode string object.
   46: \x94     MEMOIZE    (as 6)    Store the stack top into the memo.  The 
stack is not popped.
   47: N        NONE                 Push None on the stack.
   48: N        NONE                 Push None on the stack.
   49: N        NONE                 Push None on the stack.
   50: J        BININT     -1        Push a four-byte signed integer.
   55: J        BININT     -1        Push a four-byte signed integer.
   60: K        BININT1    0         Push a one-byte unsigned integer.
   62: t        TUPLE      (MARK at 40) Build a tuple out of the topmost stack 
slice, after markobject.
   63: \x94 MEMOIZE    (as 7)           Store the stack top into the memo.  The 
stack is not popped.
   64: b    BUILD                       Finish building an object, via 
__setstate__ or dict update.
   65: .    STOP                        Stop the unpickling machine.
highest protocol among opcodes = 4
```

I have tried to find the usual `__reduce__` and `__setstate__` methods to 
understand what are the various arguments, to no avail.

so, in :
```python
>>> np.dtype("int32").__reduce__()[1]
('i4', False, True)
>>> np.dtype("int32").__reduce__()[2]
(3, '<', None, None, None, -1, -1, 0)
```
what are the meaning of the various arguments ?

thanks in advance,
sebastien.

[1] https://github.com/sbinet/npyio
[2] https://numpy.org/neps/nep-0001-npy-format.html
[3] https://github.com/sbinet/npyio/issues/20
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to