On Sun, Feb 6, 2011 at 12:45 AM, Stefan Behnel <stefan...@behnel.de> wrote: > Robert Bradshaw, 04.02.2011 19:50: >> On Sat, Jan 29, 2011 at 2:35 AM, Stefan Behnel wrote: >>>> I am a bit concerned about the performance overhead of the Py_UCS4 to >>>> Py_UNICODE coercion (e.g. if constructing a Py_UNICODE* by hand), but >>>> maybe that's both uncommon and negligible. >>> >>> I think so. If users deal with Py_UNICODE explicitly, they'll likely type >>> their respective variables anyway, so that there won't be an intermediate >>> step through Py_UCS4. And on 32bit Unicode builds this isn't an issue at >>> all. > > Coming back to this once more: if the PEP gets implemented, we will only > know at C compile time (Py>=3.3 or not) if the result of indexing > (including for-loop iteration) is Py_UCS4 or Py_UNICODE. For Cython's type > inference, Py_UCS4 is therefore the more correct guess. So my proposal > stands to always infer Py_UCS4 instead of Py_UNICODE for indexing, even if > we ignore surrogate pairs in narrow Python builds. > > I will implement this for now, so that we can see what it gives.
Yes, that makes sense. >>>> Also, this would be inconsistant with >>>> python-level slicing, indexing, and range, right? >>> >>> Yes, it does not match well with slicing and indexing. That's the problem >>> with narrow builds in both CPython and Cython. Only the PEP can fix that by >>> basically dropping the restrictions of a narrow build. >> >> Lets let indexing do what indexing does. > > Ok. So you'd continue to get whatever CPython returns for indexing, i.e. > Py_UNICODE in Py<=3.2 and Py_UCS4 in Python versions that implement the > PEP. That includes separate code points for surrogate pairs on narrow builds. Yep, exactly. Note that indexing taking into account surrogate pairs can be O(n) rather than O(1) as well. - Robert _______________________________________________ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev