Robert Bradshaw, 06.02.2011 10:14: > On Sun, Feb 6, 2011 at 12:45 AM, Stefan Behnel<stefan...@behnel.de> wrote: >> Robert Bradshaw, 04.02.2011 19:50: >>> On Sat, Jan 29, 2011 at 2:35 AM, Stefan Behnel wrote: >>>>> I am a bit concerned about the performance overhead of the Py_UCS4 to >>>>> Py_UNICODE coercion (e.g. if constructing a Py_UNICODE* by hand), but >>>>> maybe that's both uncommon and negligible. >>>> >>>> I think so. If users deal with Py_UNICODE explicitly, they'll likely type >>>> their respective variables anyway, so that there won't be an intermediate >>>> step through Py_UCS4. And on 32bit Unicode builds this isn't an issue at >>>> all. >> >> Coming back to this once more: if the PEP gets implemented, we will only >> know at C compile time (Py>=3.3 or not) if the result of indexing >> (including for-loop iteration) is Py_UCS4 or Py_UNICODE. For Cython's type >> inference, Py_UCS4 is therefore the more correct guess. So my proposal >> stands to always infer Py_UCS4 instead of Py_UNICODE for indexing, even if >> we ignore surrogate pairs in narrow Python builds. >> >> I will implement this for now, so that we can see what it gives. > > Yes, that makes sense.
Done. >>>>> Also, this would be inconsistant with >>>>> python-level slicing, indexing, and range, right? >>>> >>>> Yes, it does not match well with slicing and indexing. That's the problem >>>> with narrow builds in both CPython and Cython. Only the PEP can fix that by >>>> basically dropping the restrictions of a narrow build. >>> >>> Lets let indexing do what indexing does. >> >> Ok. So you'd continue to get whatever CPython returns for indexing, i.e. >> Py_UNICODE in Py<=3.2 and Py_UCS4 in Python versions that implement the >> PEP. That includes separate code points for surrogate pairs on narrow builds. > > Yep, exactly. Note that indexing taking into account surrogate pairs > can be O(n) rather than O(1) as well. Sure, that was almost certainly the reason why the way indexing works wasn't changed when surrogate pair support was implemented in the codecs, in print etc. Stefan _______________________________________________ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev