Re: [Numpy-discussion] Fixing #736 and possible memory leak

Charles R Harris Thu, 24 Apr 2008 18:11:45 -0700

On Thu, Apr 24, 2008 at 5:58 PM, Robert Kern <[EMAIL PROTECTED]> wrote:


> On Thu, Apr 24, 2008 at 5:37 PM, Charles R Harris
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I've been looking into ticket #736 and playing with some things. In
> > arrayobject.c starting at line 8534 I added a check for strings.
> >
> >         if (PyString_Check(op)) {
> >             r = Array_FromPyScalar(op, newtype);
> >          }
> >         if (PySequence_Check(op)) {
> >             PyObject *thiserr = NULL;
> >
> >             /* necessary but not sufficient */
> >             Py_INCREF(newtype);
> >             r = Array_FromSequence(op, newtype, flags & FORTRAN,
> >                                     min_depth, max_depth);
> >             if (r == NULL && (thiserr=PyErr_Occurred())) {
> >                 if (PyErr_GivenExceptionMatches(thiserr,
> >                                                 PyExc_MemoryError)) {
> >                      return NULL;
> >                 }
> >
> > I think there may be a failure to decrement the reference to newtype
> unless
> > Array_FromSequence does that (nasty side effect);
> >
> > Anyway, the added check for a string fixes the conversion problem for
> such
> > things as int32('123'). There remains a problem with array('123',
> > dtype=int32) and with array(['123','123'], dtype=int32), but I think I
> can
> > track those down. The question is, will changing the current behavior so
> > that strings get converted to numbers cause problems with other programs
> out
> > there. I suspect I also need to check that strings are converted this way
> > only when the type is explicitly given, not detected.
>
> Seems to work for me.
>
> In [5]: array([124, '123', '123'])
> Out[5]:
> array(['124', '123', '123'],
>      dtype='|S4')


Sure, but you didn't specify the type, so numpy determined that it was numpy
string type. Wrong test. Try

In [1]: array(['123'], dtype=int32)
Out[1]: array([[1, 2, 3]])

In [2]: a = ones(3, dtype=int32)

In [3]: a[...] = '123'

In [4]: a
Out[4]: array([1, 2, 3])

In [5]: a[...] = int32('123')

In [6]: a
Out[6]: array([123, 123, 123])

So on and so forth. The problem is this bit of code (among others)

    stop_at_string = ((type == PyArray_OBJECT) ||
                      (type == PyArray_STRING &&
                       typecode->type == PyArray_STRINGLTR) ||
                      (type == PyArray_UNICODE) ||
                      (type == PyArray_VOID));


The question is, how do we interpret a string when the type is specified? I
think in that case we should try to convert the string to the relevant type,
just as we cast numbers to the relevant type. So we should always stop at
string.

Chuck

_______________________________________________
Numpy-discussion mailing list
[email protected]
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fixing #736 and possible memory leak

Reply via email to