Re: [Numpy-discussion] Transparently reading complex arrays from netcdf4

Stephan Hoyer Sun, 30 Mar 2014 16:34:21 -0700

Hi Glenn,

Here is the line in my linked code defining the __array__ method:
https://github.com/akleeman/xray/blob/0c1a963be0542b7303dc875278f3b163a15429c5/src/xray/conventions.py#L152


I don't know when Jeff Whitaker will be releasing the next version of
netCDF4, but I expect that might be pretty soon if you asked nicely!
Otherwise you can always download the development version off of github:
https://github.com/Unidata/netcdf4-python

Cheers,
Stephan


On Sun, Mar 30, 2014 at 5:18 AM, G Jones <glenn.calt...@gmail.com> wrote:

> Hi,
> This looks useful. What you said about __array__ makes sense, but I didn't
> see it in the code you linked.
> Do you know when python netcdf4 will support the numpy array interface
> directly? I searched around for a roadmap but didn't find anything. It may
> be best for me to proceed with a slightly clumsy interface for now and wait
> until the array interface is built in for free.
>
> Thanks,
> Glenn
> On Mar 30, 2014 2:18 AM, "Stephan Hoyer" <sho...@gmail.com> wrote:
>
>> Hi Glenn,
>>
>> Here is a full example of how we wrap a netCDF4.Variable object,
>> implementing all of its ndarray-like methods:
>>
>> https://github.com/akleeman/xray/blob/0c1a963be0542b7303dc875278f3b163a15429c5/src/xray/conventions.py#L91
>>
>> The __array__ method would be the most relevant one for you: it means
>> that numpy knows how to convert the wrapper array into a numpy.ndarray when
>> you call np.mean(cplx_data). More generally, any function that calls
>> np.asarray(cplx_data) will properly convert the values, which should
>> include most functions from well-written libraries (including numpy and
>> scipy). netCDF4.Variable doesn't currently have such an __array__ method,
>> but it will in the next released version of the library.
>>
>> The quick and dirty hack to make all numpy methods work (now going beyond
>> what the netCDF4 library implements) would be to add something like the
>> following:
>>
>>     def __getattr__(self, attr):
>>         return getattr(np.asarray(self), attr)
>>
>> But this is a little dangerous, since some methods might silently fail or
>> give unpredictable results (e.g., those that modify data). It would be
>> safer to list the methods you want to implement explicitly, or to just
>> liberally use np.asarray. The later is generally a good practice when
>> writing library code, anyways, to catch unusual ndarray subclasses like
>> np.matrix.
>>
>> Stephan
>>
>>
>> On Sat, Mar 29, 2014 at 8:42 PM, G Jones <glenn.calt...@gmail.com> wrote:
>>
>>> Hi Stephan,
>>> Thanks for the reply. I was thinking of something along these lines but
>>> was hesitant because while this provides clean access to chunks of the
>>> data, you still have to remember to do cplx_data[:].mean() for example in
>>> the case that you want cplx_data.mean().
>>>
>>> I was hoping to basically have all of the ndarray methods at hand
>>> without any indexing, but then also being smart about taking advantage of
>>> the mmap when possible. But perhaps your solution is the best compromise.
>>>
>>> Thanks again,
>>> Glenn
>>> On Mar 29, 2014 10:59 PM, "Stephan Hoyer" <sho...@gmail.com> wrote:
>>>
>>>> Hi Glenn,
>>>>
>>>> My usual strategy for this sort of thing is to make a light-weight
>>>> wrapper class which reads and converts values when you access them. For
>>>> example:
>>>>
>>>> class WrapComplex(object):
>>>>     def __init__(self, nc_var):
>>>>         self.nc_var = nc_var
>>>>
>>>>     def __getitem__(self, item):
>>>>         return self.nc_var[item].view('complex')
>>>>
>>>> nc = netCDF4.Dataset('my.nc')
>>>> cplx_data = WrapComplex(nc.groups['mygroup'].variables['cplx_stuff'])
>>>>
>>>> Now you can index cplx_data (e.g., cplx_data[:10]) and only the values
>>>> you need will be read from disk and converted on the fly.
>>>>
>>>> Hope this helps!
>>>>
>>>> Cheers,
>>>> Stephan
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Mar 29, 2014 at 6:13 PM, G Jones <glenn.calt...@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>> I am using netCDF4 to store complex data using the recommended
>>>>> strategy of creating a compound data type with the real and imaginary
>>>>> parts. This all works well, but reading the data into a numpy array is a
>>>>> bit clumsy.
>>>>>
>>>>> Typically I do:
>>>>>
>>>>> nc = netCDF4.Dataset('my.nc')
>>>>> cplx_data =
>>>>> nc.groups['mygroup'].variables['cplx_stuff'][:].view('complex')
>>>>>
>>>>> which directly gives a nice complex numpy array. This is OK for small
>>>>> arrays, but is wasteful if I only need some chunks of the array because it
>>>>> reads all the data in, reducing the utility of the mmap feature of netCDF.
>>>>>
>>>>> I'm wondering if there is a better way to directly make a numpy array
>>>>> view that uses the netcdf variable's memory mapped buffer directly. 
>>>>> Looking
>>>>> at the Variable class, there is no access to this buffer directly which
>>>>> could then be passed to np.ndarray(buffer=...).
>>>>>
>>>>> Any ideas of simple solutions to this problem?
>>>>>
>>>>> Thanks,
>>>>> Glenn
>>>>>
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion@scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Transparently reading complex arrays from netcdf4

Reply via email to