On Jul 6, 2009, at 1:12 PM, Elaine Angelino wrote: > Hi -- We are subclassing from np.rec.recarray and are confused about > how some methods of np.rec.recarray relate to (differ from) > analogous methods of its parent, np.ndarray. Below are specific > questions about the __eq__, __getitem__ and view methods, we'd > appreciate answers to our specific questions and/or more general > points that we may be not understanding about subclassing from > np.ndarray (and np.rec.recarray).
For generic information about subclassing, please refer to: http://www.scipy.org/Subclasses http://docs.scipy.org/doc/numpy/user/basics.subclassing.html > 1) Suppose I have a recarray object, x. How come > np.ndarray.__getitem__(x, 'column_name') returns a recarray object > rather than a ndarray? e.g., ndarray.__getitem__(x, item) calls x.__array_finalize__ if item is a basestring and not an integer. __array_finalize__ outputs an array of the same subtype as x (here, a recarray). > 2)a) When I use the __getitem__ method of recarray to get an > individual column, the returned object is an ndarray when the column > is a numeric type but it is a recarray when the column is a string > type. Why doesn't __getitem__ always return an ndarray for an > individual column? e.g., > > > In [175]: x = np.rec.fromrecords([(1,'dd'), (2,'cc')], > names=['a','b']) > In your example. >>> x.dtype dtype([('a', '<i4'), ('b', '|S2')]) So, field 'a' has a dtype int, which is a built-in dtype, while field 'b' has a dtype '|S2', which is NOT a dtype. The code of recarray.__getitem__ shows you that in the first case, when the dtype of the output is a built-in, the output recarray (x['a']) is viewed as a standard ndarray. Not the case with x['b']. Why ? Ask Travis O. > 2)b) Suppose I have a subclass of recarray, NewRecarray, that > attaches some new attribute, e.g. 'info'. > > x = NewRecarray(data, names = ['a','b'], formats = '<i4, |S2') > > Now say I want to use recarray's __getitem__ method to get an > individual column. Then > > x['a'] is an ndarray > x['b'] is a NewRecarray and x['b'].info == x.info > > Is this the expected / proper behavior? Is there something wrong > with the way I've subclassed recarray? No, that's expected behavior. Once again, calling getitem with a field name as input calls __array_finalize__ internally. __array_finalize__ transforms the output in an array w/ the same subclass as your input: that's why x['b'] is a NewRecArray/ However, if the dtype of the output is builtin, it's transformed back to a standard ndarray: that's why x['a'] is a standard ndarray. > --- > > 3)a) If I have two recarrays with the same len and column headers, > the __eq__ method returns the rich comparison. Why is the result a > recarray rather than an ndarray? > > In [162]: x = np.rec.fromrecords([(1,'dd'), (2,'cc')], > names=['a','b']) > In [163]: y = np.rec.fromrecords([(1,'dd'), (2,'cc')], > names=['a','b']) > In [164]: x == y > Out[164]: rec.array([ True, True], dtype=bool) OK, as far as I understand, here's what's going on: * First, we check whether the dtypes are compatible. * Then, each field of x is compared to the corresponding field of y, which calls a __array_finalize__ internally, and __array_wrap__ (because you call the 'equal' ufunc). * Then, a __array_finalize__ is called on the output, which transforms it back to a recarray. > 3)b) Suppose I have a subclass of recarray, NewRecarray, that > attaches some new attribute, e.g. 'info'. > > x = NewRecarray(data) > y = NewRecarray(data) > z = x == y > > Then z is a NewRecarray object and z.info = x.info. > > Is this the expected / proper behavior? Is there something wrong > with the way I've subclassed recarray? [Dan Yamins asked this a > couple days ago] To tell you whether there's something wrong, I'd need to see the code. I'm not especially surprised by this behavior... > --- > > 4) Suppose I have a subclass of np.ndarray, NewArray, that attaches > some new attribute, e.g. 'info'. When I view a NewArray object as a > ndarray, the result has no 'info' attribute. Is the memory > corresponding to the 'info' attribute garbage collected? What > happens to it? It's alive! No, seriously: when you take a view as a ndarray, you only access the portion of memory corresponding to the values of your ndarray and none of its extra info. Same thing as calling .__array__() on your object. So the information is still accessible, as long as the initial object exists (Correct me if I'm wrong on this one...) _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion