Vitja Makarov, 07.11.2011 19:28:
2011/11/6 Stefan Behnel:Vitja Makarov, 06.11.2011 18:10:When file encoding is specified cython generates two PyObject entries for string consts one for the variable name and one for the string constant.That's because the content may actually become different after decoding, even if the encoded byte sequence is identical. Note that decoding is only done in Py3. In Py2, the byte sequence is used, so both values are identical.If they are the identical after decoding isn't it better to have only one of them?
Well, yes. That's not trivial, though, because the decision is taken at C compile time. And the benefit tends to be negligible, because this case is really rare and the affected strings tend to be quite short.
Here is minimal example: $ cat cplus.pyx # -*- coding: koi8-r -*- wtf = 'wtf' Generaets the following code: /* Implementation of 'cplus' */ static char __pyx_k__wtf[] = "wtf"; static char __pyx_k____main__[] = "__main__"; static char __pyx_k____test__[] = "__test__"; static PyObject *__pyx_n_s____main__; static PyObject *__pyx_n_s____test__; static PyObject *__pyx_n_s__wtf; static PyObject *__pyx_n_s__wtf; ... static __Pyx_StringTabEntry __pyx_string_tab[] = { {&__pyx_n_s____main__, __pyx_k____main__, sizeof(__pyx_k____main__), 0, 0, 1, 1}, {&__pyx_n_s____test__, __pyx_k____test__, sizeof(__pyx_k____test__), 0, 0, 1, 1}, {&__pyx_n_s__wtf, __pyx_k__wtf, sizeof(__pyx_k__wtf), "koi8-r", 0, 1, 1}, {&__pyx_n_s__wtf, __pyx_k__wtf, sizeof(__pyx_k__wtf), 0, 0, 1, 1}, {0, 0, 0, 0, 0, 0, 0} };Both Python object variables should have different cnames.What's about adding encoding suffix?
Yes, I think that would fix it, although it could be a bit misleading when reading the C code with a Py3 context in mind. But using a counter doesn't make it very readable, either.
Stefan _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel