On Sep 14, 2009, at 12:05 PM, Stefan Behnel wrote:
> Robert Bradshaw wrote:
>> On Sep 13, 2009, at 12:39 PM, Stefan Behnel wrote:
>>>>> cdef str s = "some string"
>>>>> cdef char* cs = s
>>>>>
>>>> I'm inclined for a warning... and that warning would not be
>>>> generated
>>>> in this case: "cdef char*cs = <bytes>s" , right?
>>> Sure.
>>
>> That could be bad, <bytes>s doesn't actually do a typecheck,
>> especially if the bytes -> char* is eventually optimized. One should
>> do <bytes?>s or <object>s (neither of which generate a warning).
>
> To me, that's just like casting an int to a void*. I don't see a
> reason to
> special case some casts while we already allow all that dangerous C
> stuff.
> If nothing else, a cast is a clear way to say "I know better!". And
> if you
> actually do not know better, you'll see where that gets you. Not
> Cython's
> problem.
Yes, as I said I was just saying that we shouldn't encourage *this*
solution, as it doesn't do type checking.
>>> changing the argument/return value types from "object" to the
>>> right types will allow Cython to do actual type checking.
>>
>> Often the type checking will be redundant with the type checking that
>> happens inside the method, so I'm not so sure this is a good idea.
>
> I meant compile time type checking, which won't hurt performance
> but helps
> in making the C-API safer and also allows Cython to do some
> optimisations.
Sometimes. For example, PyUnicode_GetSize in principle take a unicode
object, but is only typed to take a object. It performs its own
typecheck, so we should just define it as taking an object and not do
the redundant type check ourselves.
> For example, I only noticed recently that literal Python strings were
> always treated as "object" in Cython. So things like u"".join()
> were never
> associated with the unicode type.
Yes, if u"" is typed, we should be able to optimize on it.
>>>>> And "str", "bytes" and "unicode" wouldn't be assignable to each
>>>>> other,
>>>>> right? Or would you also leave that to runtime?
>>>> "bytes" <-> "unicode" (obviously?) would not be assignable,
>>>> tough for
>>>> the case of "bytes" <-> "str" or "str" <-> "unicode", we could
>>>> generate similar Cython compile warnings as for the "[unsigned ]
>>>> char
>>>> *" conversions.
>>> Yes, I guess that's a similar case.
>>
>> I'd be inclined to outright disallow them, favoring requiring <bytes?
>>> or <unicode?> or <object> cast.
>
> Perfectly fine with me.
>
>
>> Currently, though, I can't think
>> of any reason to type str/bytes/unicode variables at all.
>
> You should take a look at the call optimisations for builtin types.
> I've
> been adding to them for a while now, and they really make a huge
> difference.
>
> For example, this:
>
> cdef unicode u = some_unicode_string
> s = u.encode('UTF-8')
>
> will now result in a straight C call to the UTF-8 encoder, instead of
> looking up the method, calling it, and having it look up the codec
> internally. I find that pretty cool.
Hmm, not for me (at least not in the -devel branch), but I could see
this being very nice.
- Robert
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev