A Friday 05 December 2008, Brennan Williams escrigué: > Robert Kern wrote: > > On Thu, Dec 4, 2008 at 18:54, Brennan Williams > > > > <[EMAIL PROTECTED]> wrote: > >> Thanks > >> > >> [EMAIL PROTECTED] wrote: > >>> I didn't check what this does behind the scenes, but try this > >> > >> import hashlib #standard python library > >> import numpy as np > >> > >>> m = hashlib.md5() > >>> m.update(np.array(range(100))) > >>> m.update(np.array(range(200))) > > > > I would recommend doing this on the strings before you make arrays > > from them. You don't know if the network cut out in the middle of > > an 8-byte double. > > > > Of course, sending the lengths and other metadata first, then the > > data would let you check without needing to do expensivish hashes > > or checksums. If truncation is your problem rather than corruption, > > then that would be sufficient. You may also consider using the NPY > > format in numpy 1.2 to implement that. > > Thanks for the ideas. I'm definitely going to add some more basic > checks on lengths etc as well. > Unfortunately the problem is happening at a client site so (a) I > can't reproduce it and (b) most of the > time they can't reproduce it either. This is a Windows Python app > running on Citrix reading/writing data > to a Linux networked drive.
Another possibility would be to use HDF5 as a data container. It supports the fletcher32 filter [1] which basically computes a chuksum for evey data chunk written to disk and then always check that the data read satifies the checksum kept on-disk. So, if the HDF5 layer doesn't complain, you are basically safe. There are at least two usable HDF5 interfaces for Python and NumPy: PyTables[2] and h5py [3]. PyTables does have support for that right out-of-the-box. Not sure about h5py though (a quick search in docs doesn't reveal nothing). [1] http://rfc.sunsite.dk/rfc/rfc1071.html [2] http://www.pytables.org [3] http://h5py.alfven.org Hope it helps, -- Francesc Alted _______________________________________________ Numpy-discussion mailing list [email protected] http://projects.scipy.org/mailman/listinfo/numpy-discussion
