On Mon, 9 Oct 2023 at 17:03, Nathan <nathan.goldb...@gmail.com> wrote:
>
> On Mon, Oct 9, 2023 at 12:57 AM Aaron Meurer <asmeu...@gmail.com> wrote:
>>
>> Is it possible to convert a NumPy 1 pickle file into a generic pickle
>> file that works in both NumPy 1 and 2? As far as I understand, pickle
>> is Turing complete, so I imagine it should be theoretically possible,
>> but I don't know how easy it would be to actually do this or how it
>> would affect the pickle file size.

There are many ways that this could be made to work with the various
options like reduce() etc.

> The issue is that the pickle protocol needs a reference to a reconstructor to 
> recreate numpy types. For ndarray, that function is currently 
> `numpy.core.multiarray._reconstruct` and in numpy 2 becomes 
> numpy._core.multiarray.reconstruct. For a pickle file containing only an 
> ndarray, this is the first thing in the pickle file and the import happens 
> inside of the pickle implementation. I am not aware of a hook that Python 
> gives us to intercept that path before Python imports it.
>
> So, even if there is a way to correct subsequent paths in the pickle file, we 
> won't be able to fix the most problematic path that will occur in any pickle 
> that includes a numpy array. That means some user-visible pain no matter 
> what. If we can't avoid that, I'd prefer to offer a solution that will allow 
> people to continue loading old pickle files indefinitely (albeit with a minor 
> code change). This also gives us a place to put compatibility fixes for 
> future changes that impact old pickle files.

Suppose that there is NumPy v1 and that in future there will be NumPy
v2. Also suppose that there will be two NumPy pickle formats fmtA and
a future fmtB. One possibility is that NumPy v1 only reads and writes
fmtA and then NumPy v2 only reads and writes fmtB. One problem with
this is that when NumPy v2 comes out there is no easy way to convert
pickles from fmtA to fmtB for compatibility with NumPy v2. Another
problem with this is that it does not make a nice transition during
any period of time when both NumPy v1 and v2 might be used in
different parts of a software stack.

An alternative is to introduce fmtB as part of the NumPy v1 series.
NumPy could be changed now so that it can read both fmtA and fmtB but
by default it would write fmtB which would be designed ahead of time
so that in future NumPy v2 would be able to read fmtB as well. It
would also be possible to design it so that fmtB would be readable by
older versions of NumPy that were released before fmtB was designed.

Then there is a version of NumPy (v1) which can read fmtA and write to
fmtB. This version of NumPy can be used to convert pickles from fmtA
to fmtB. Then when NumPy v2 is released it can already read any
pickles that were generated by the most recent releases of NumPy v1.x.
Anyone who still has older pickles in fmtA could use NumPy v1 to do
dumps(loads(f)) which would convert from fmtA to fmtB.

In this scenario the only part that does not work is reading fmtA in
NumPy v2 which is unavoidable if numpy.core is removed or renamed in
v2.

--
Oscar
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to