On Sun, Feb 6, 2011 at 12:45 AM, Stefan Behnel <stefan...@behnel.de> wrote:
> Robert Bradshaw, 04.02.2011 19:50:
>> On Sat, Jan 29, 2011 at 2:35 AM, Stefan Behnel wrote:
>>>> I am a bit concerned about the performance overhead of the Py_UCS4 to
>>>> Py_UNICODE coercion (e.g. if constructing a Py_UNICODE* by hand), but
>>>> maybe that's both uncommon and negligible.
>>>
>>> I think so. If users deal with Py_UNICODE explicitly, they'll likely type
>>> their respective variables anyway, so that there won't be an intermediate
>>> step through Py_UCS4. And on 32bit Unicode builds this isn't an issue at 
>>> all.
>
> Coming back to this once more: if the PEP gets implemented, we will only
> know at C compile time (Py>=3.3 or not) if the result of indexing
> (including for-loop iteration) is Py_UCS4 or Py_UNICODE. For Cython's type
> inference, Py_UCS4 is therefore the more correct guess. So my proposal
> stands to always infer Py_UCS4 instead of Py_UNICODE for indexing, even if
> we ignore surrogate pairs in narrow Python builds.
>
> I will implement this for now, so that we can see what it gives.

Yes, that makes sense.

>>>> Also, this would be inconsistant with
>>>> python-level slicing, indexing, and range, right?
>>>
>>> Yes, it does not match well with slicing and indexing. That's the problem
>>> with narrow builds in both CPython and Cython. Only the PEP can fix that by
>>> basically dropping the restrictions of a narrow build.
>>
>> Lets let indexing do what indexing does.
>
> Ok. So you'd continue to get whatever CPython returns for indexing, i.e.
> Py_UNICODE in Py<=3.2 and Py_UCS4 in Python versions that implement the
> PEP. That includes separate code points for surrogate pairs on narrow builds.

Yep, exactly. Note that indexing taking into account surrogate pairs
can be O(n) rather than O(1) as well.

- Robert
_______________________________________________
Cython-dev mailing list
Cython-dev@codespeak.net
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to